A practical method for compatibility evaluation of portable document formats

Authors:
Dariusz Król;Michał Łopatka
Affiliations:
School of Design, Engineering and Computing, Bournemouth University, UK, Institute of Informatics, Wrocław University of Technology, Poland;Faculty of Computer Science and Management, Wrocław University of Technology, Poland
Venue:
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II
Year:
2013

Citing 7
Cited 0

Mapping and displaying structural transformations between XML and PDF

Proceedings of the 2002 ACM symposium on Document engineering
Structured Document Segmentation and Representation by the Modified X-Y tree

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Automatic extraction of titles from general documents using machine learning

Information Processing and Management: an International Journal
Improving the Table Boundary Detection in PDFs by Fixing the Sequence Error of the Sparse Lines

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
SciPlore Xtract: extracting titles from scientific PDF documents by analyzing style information

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
A system for converting PDF documents into structured XML format

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method for verification of PDF documents for compatibility with publication models provided by scientific publishers. We first consider the problem of converting a document from PDF to XML format. Subsequently, we present an analysis of the document's graphical layout which operates in two phases. The first phase develops a model using a semi-automatic process with limited user interaction. This is followed by comparing and matching of submitted documents. The experimental results demonstrate the degree of document compatibility with the model along with a report of errors and warning messages.