From Paper to Office Document Standard Representation

Authors:
Andreas Dengel;Rainer Bleisinger;Rainer Hoch;Frank Fein;Frank Hönes
Affiliations:
-;-;-;-;-
Venue:
Computer
Year:
1992

Citing 1
Cited 6

Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization

Computer

Using IR techniques for text classification in document analysis

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Geometric Structure Analysis of Document Images: A Knowledge-Based Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Structure analysis and generation for internet documents

Intelligent exploration of the web
Logical Structure Analysis and Generation for Structured Documents: A Syntactic Approach

IEEE Transactions on Knowledge and Data Engineering
RELATIONAL DATA MINING AND ILP FOR DOCUMENT IMAGE UNDERSTANDING

Applied Artificial Intelligence
Constraint solving over OCR graphs

INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support

Quantified Score

Hi-index	4.10

Visualization

Abstract

The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented. Initially, Pi ODA extracts a part-of hierarchy of nested layout objects such as text-blocks, lines, and words based on their presentation on the page. Subsequently, in a step called logical labeling, the layout objects and their compositions are geometrically analyzed to identify corresponding logical objects that can be related to a human perceptible meaning, such as sender, recipient, and date in a letter. A context-sensitive text recognition for logical objects is then applied using logical vocabularies and syntactic knowledge. As a result, Pi ODA produces a document representation that conforms to the ODA international standard.