Layout and Language: Exploring Text Block Discovery in Tables Using Linguistic Resources

  • Authors:
  • Affiliations:
  • Venue:
  • ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: Identifying the textual content of table cells requires, in part, the successful resolution of ambiguities confusing multi-row cells and single-row cells, as well as the resolution of other layout based ambiguities. This paper investigates the application of linguistic resources to this problem and discusses algorithms that exploit both phrasal dictionaries and bigram language models for discovering the content of cells in flat text files.