Treatment of Diagrams in Document Image Analysis
Diagrams '00 Proceedings of the First International Conference on Theory and Application of Diagrams
Defining the Syntax and Semantics of Natural Visual Languages
AGTIVE '99 Proceedings of the International Workshop on Applications of Graph Transformations with Industrial Relevance
Hi-index | 0.00 |
The proliferation of electronic document formats impedes the dissemination and management of documents. Indeed, a common format with structural information is required to obtain document indexing and navigation. While in some formats it is easy to decode and preserve the document structure information, often the only easily obtainable representation is Postscript, where only the geometrical information remains. Even if an organization is willing to convert all its document producing activities to a structure preserving format such as HTML, the existing documents need to be converted. The paper addresses the difficult problem of extracting the structure of a document from a geometrical representation. An interactive tool to extract the document content and structure from a geometric representation (Postscript) has been developed. It successfully analyzes several documents produced with different tools, and produces structural information using the HyperText Markup Language (HTML). The end user, when presented with the extracted document structure, can interactively modify it, if needed. The tool is easily extended to recognize new constructs and is aimed at organizations needing to convert numerous documents for searching and browsing on intranets or on the Internet.