Document understanding of graphical content in natively digital PDF documents

  • Authors:
  • Aysylu Gabdulkhakova;Tamir Hassan

  • Affiliations:
  • Ufa State Aviation Technical University, Ufa, Russian Fed.;Technische Universität Wien, Vienna, Austria

  • Venue:
  • Proceedings of the 2012 ACM symposium on Document engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an object-based method for analysing the content drawn by graphical operators in natively digital PDF documents. We propose that graphical content in a document can be classified either as structural or non-structural and present an output model for our analysis result. Heuristic techniques are used to group the instructions into regions and determine their logical role in the document's structure. Experimental results demonstrate the effectiveness of the algorithm.