A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Overview of the Tesseract OCR Engine
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Hi-index | 0.00 |
Current Optical Character Recognition (OCR) systems are not capable of detection and recognition of detached words on an image, especially if the text is not located horizontally. Such text blocks are typical of charts and graphs. In this paper an algorithm of detection of small text blocks with arbitrary orientation, color, style, and font size, which can be used for text localization before application of arbitrary character recognition system, is proposed. According to the experimental results, the use of the proposed algorithm for determination of the location and orientation of text blocks on charts and graphs and the transmission of this information to text recognition system allow increasing the fullness by 20 times and the text recognition precision by 15 times. The experiments were carried out on a test collection of 1000 charts containing about 14 000 text blocks, which was created by means of the XML/SWF Chart tool.