Mapping and displaying structural transformations between XML and PDF
Proceedings of the 2002 ACM symposium on Document engineering
Structured Document Segmentation and Representation by the Modified X-Y tree
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Automatic extraction of titles from general documents using machine learning
Information Processing and Management: an International Journal
Improving the Table Boundary Detection in PDFs by Fixing the Sequence Error of the Sparse Lines
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
SciPlore Xtract: extracting titles from scientific PDF documents by analyzing style information
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
A system for converting PDF documents into structured XML format
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
This paper presents a method for verification of PDF documents for compatibility with publication models provided by scientific publishers. We first consider the problem of converting a document from PDF to XML format. Subsequently, we present an analysis of the document's graphical layout which operates in two phases. The first phase develops a model using a semi-automatic process with limited user interaction. This is followed by comparing and matching of submitted documents. The experimental results demonstrate the degree of document compatibility with the model along with a report of errors and warning messages.