Document understanding of graphical content in natively digital PDF documents

Authors:
Aysylu Gabdulkhakova;Tamir Hassan
Affiliations:
Ufa State Aviation Technical University, Ufa, Russian Fed.;Technische Universität Wien, Vienna, Austria
Venue:
Proceedings of the 2012 ACM symposium on Document engineering
Year:
2012

Citing 3
Cited 0

Object-level document analysis of PDF files

Proceedings of the 9th ACM symposium on Document engineering
An open approach towards the benchmarking of table structure recognition systems

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A system for converting PDF documents into structured XML format

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an object-based method for analysing the content drawn by graphical operators in natively digital PDF documents. We propose that graphical content in a document can be classified either as structural or non-structural and present an output model for our analysis result. Heuristic techniques are used to group the instructions into regions and determine their logical role in the document's structure. Experimental results demonstrate the effectiveness of the algorithm.