Natural language inspired approach for handwritten text line detection in legacy documents

  • Authors:
  • Vicente Bosch Campos;Alejandro Héctor Toselli;Enrique Vidal

  • Affiliations:
  • Univ. Politécnica Valencia, Valencia, Spain;Univ. Politécnica Valencia, Valencia, Spain;Univ. Politécnica Valencia, Valencia, Spain

  • Venue:
  • LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document layout analysis is an important task needed for handwritten text recognition among other applications. Text layout commonly found in handwritten legacy documents is in the form of one or more paragraphs composed of parallel text lines. An approach for handwritten text line detection is presented which uses machine-learning techniques and methods widely used in natural language processing. It is shown that text line detection can be accurately solved using a formal methodology, as opposed to most of the proposed heuristic approaches found in the literature. Experimental results show the impact of using increasingly constrained "vertical layout language models" in text line detection accuracy.