Document analysis applied to fragments: feature set for the reconstruction of torn documents

  • Authors:
  • Markus Diem;Florian Kleber;Robert Sablatnig

  • Affiliations:
  • Institute of Computer Aided Automation, Vienna;Institute of Computer Aided Automation, Vienna;Institute of Computer Aided Automation, Vienna

  • Venue:
  • DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. In this paper document analysis is applied to snippets of torn documents to calculate features that can be used for reconstruction. The main intention is to handle snippets of varying size and different contents (e.g. handwritten or printed text). Documents can either be destroyed by the intention to make the printed content unavailable (e.g. business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, or e.g. inpainting and texture synthesis techniques. In this paper the potential of document analysis techniques of snippets to support a reconstruction algorithm by considering additional features is shown. This implies a rotational analysis, a color analysis, a line detection, a paper type analysis (checked, lined, blank) and a classification of the text (printed or hand written). Preliminary results show that these features can be determined reliably on a real dataset consisting of 690 snippets.