Hi-Fi HTML rendering of multi-format documents in DoMinUS

Authors:
Stefano Ferilli;Floriana Esposito;Domenico Redavid
Affiliations:
University of Bari, Bari, Italy;University of Bari, Bari, Italy;Artificial Brain S.r.l., Bari, Italy
Venue:
Proceedings of the 2013 ACM symposium on Document engineering
Year:
2013

Citing 5
Cited 0

Two Geometric Algorithms for Layout Analysis

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A Distance-Based Technique for Non-Manhattan Layout Analysis

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Document analysis system

IBM Journal of Research and Development
Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques

Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques
Mathematical Formula Identification in PDF Documents

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Digital Libraries collect, organize and provide to end users large quantities of selected documents. While these documents come in a variety of formats, it is desirable that they are delivered to final users in a uniform way. Web formats are a suitable choice for this purpose. While Web documents are very flexible as to layout presentation, that is determined at runtime by the interpreter, documents coming from a library should preserve their original layout when displayed to final users. Using raster images would not allow the user to access the actual content of the document's components (text and images). This paper presents a technique to render in an HTML file the original layout of a document, preserving the peculiarity of its components (text, images, formulas, tables, algorithms). It builds on the DoMInUS framework, that can process documents in several source formats.