Two Geometric Algorithms for Layout Analysis
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A Distance-Based Technique for Non-Manhattan Layout Analysis
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
IBM Journal of Research and Development
Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques
Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques
Mathematical Formula Identification in PDF Documents
ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
Digital Libraries collect, organize and provide to end users large quantities of selected documents. While these documents come in a variety of formats, it is desirable that they are delivered to final users in a uniform way. Web formats are a suitable choice for this purpose. While Web documents are very flexible as to layout presentation, that is determined at runtime by the interpreter, documents coming from a library should preserve their original layout when displayed to final users. Using raster images would not allow the user to access the actual content of the document's components (text and images). This paper presents a technique to render in an HTML file the original layout of a document, preserving the peculiarity of its components (text, images, formulas, tables, algorithms). It builds on the DoMInUS framework, that can process documents in several source formats.