The goal of this research is to detect amd read English text occuring in images of natural scenes.
Text present in digital images can provide useful information about the image and the group of images it belongs to. On this page we present some results of automatic detection and character recognition of text in natural scenes. The results are presented for three diffrent approaches to text detection and reading. The important contribution in two of them is that there is no third party Optical Character Recognition (OCR) software need for reading the text.
For all three approaches, We first locate the text regions in the image. The first figure on the left presents initial results of text detection. The second picture shows the text region after combining the detection rectangles using a histogram binning based method. After this we read the text by recognizing the characters.
Text detection before filtering rectangles. |
Text detection after filtering rectangles. |
'P' wrongly detected as 'D'. |
Text region after k-means segmentation. |
Text region after binarization. |
Final letter recognition and segmentation, here 'C' of 'CASTLE' was segmented but not recognized. |
Text detection before filtering rectangles. |
Text detection after filtering rectangles. |
'E' of 'AGENTS' and corner of window wrongly detected as 'L'. |
Text region after k-means segmentation. |
Text region after binarization. |
Final letter recognition and segmentation, here 'N' of 'SAXONS' was segmented but not recognized. Also, 'G' was identified as 'O' |
Text detection before filtering rectangles. |
Text detection after filtering rectangles. |
Part of 'S' wrongly detected as 'D'. |
Text region after k-means segmentation. |
Text region after binarization. |
All letters correctly segmented and recognized |
Text detection before filtering rectangles. |
Text detection after filtering rectangles. |
'O' of 'TO' detected wrongly as 'P'. |
Text region after k-means segmentation. |
Text region after binarization. |
The letter 'W' that borders the edge is detected as a 'V' |
Sabine Süsstrunk
Olivier Küng
This work is supported by National Competence Center in Research on Mobile Information and Communication Systems (NCCR-MICS), a center supported by the Swiss National Science Foundation under grant number 5005-67322, and by K-Space, the European Network of Excellence in Knowledge Space of semantic inference for automatic annotation and retrieval of multimedia content.