Extraction of data from preprinted forms
Machine Vision and Applications - Special issue: document image analysis techniques
Layout Recognition of Multi-Kinds of Table-Form Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient extraction of primitives from line drawings composed of horizontal and vertical lines
Machine Vision and Applications
A Tabular Survey of Automated Table Processing
GREC '99 Selected Papers from the Third International Workshop on Graphics Recognition, Recent Advances
Methodology of Automatic Extraction of Table-Form Cells
SIBGRAPI '00 Proceedings of the 13th Brazilian Symposium on Computer Graphics and Image Processing
A Keyword Spotting System of Korean Document Images
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Retrieving Imaged Documents in Digital Libraries Based on Word Image Coding
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
A General System for the Retrieval of Document Images from Digital Libraries
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
A document image preprocessing system for keyword spotting
ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
Hi-index | 0.00 |
This paper describes a method to extract words from table regions in document images. The proposed approach consists of two stages: cell detection and word extraction. In the cell detection module, a table frame is extracted first by analyzing connected components and then intersection points are detected by a method using masks in the table frame. We correct false intersections, and detect the location of the cells within the table. In the word extraction module, a text region in each cell is located by using the connected components information that was obtained during the cell extraction module, and segmented into text lines by using projection profiles. Finally we divide the segmented lines into words using gap clustering and special symbol detection. The method correctly included character components touching the table frame with words, so experimental results show that more than 99% of words were successfully extracted from table regions.