Using IR techniques for text classification in document analysis
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Geometric Structure Analysis of Document Images: A Knowledge-Based Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Structure analysis and generation for internet documents
Intelligent exploration of the web
Logical Structure Analysis and Generation for Structured Documents: A Syntactic Approach
IEEE Transactions on Knowledge and Data Engineering
RELATIONAL DATA MINING AND ILP FOR DOCUMENT IMAGE UNDERSTANDING
Applied Artificial Intelligence
Constraint solving over OCR graphs
INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
Hi-index | 4.10 |
The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented. Initially, Pi ODA extracts a part-of hierarchy of nested layout objects such as text-blocks, lines, and words based on their presentation on the page. Subsequently, in a step called logical labeling, the layout objects and their compositions are geometrically analyzed to identify corresponding logical objects that can be related to a human perceptible meaning, such as sender, recipient, and date in a letter. A context-sensitive text recognition for logical objects is then applied using logical vocabularies and syntactic knowledge. As a result, Pi ODA produces a document representation that conforms to the ODA international standard.