Word Searching in Document Images Using Word Portion Matching
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Logical Labeling of Document Images Using Layout Graph Matching with Adaptive Learning
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Information Retrieval in Document Image Databases
IEEE Transactions on Knowledge and Data Engineering
A Document Image Retrieval System
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
Abstract: A new method for information extraction from document images is proposed in this paper as the basis for a document reader which can extract required keywords and their logical relationship from various printed documents. Such documents obtained from OCR results may have not only unknown words and compound words, but also incorrect words due to OCR errors. To cope with OCR errors, the proposed method adopts robust keyword matching which searches for a string pattern from two dimensional OCR results consisting of a set of possible character candidates. This keyword matching uses a keyword dictionary that includes incorrect words with typical OCR errors and segments of words to deal with the above difficulties. After keyword matching, a global document matching is carried out between keyword matching results in an input document and document models which consist of keyword models and their logical relationship. This global matching determines the most suitable model for the input document and solves word segmentation problems accurately even if the document has unknown words, compound words, or incorrect words. Experimental results obtained for 100 documents show that the method is robust and effective for various document structures.