Information extraction from scanned documents by stochastic page layout analysis

  • Authors:
  • Atsuhiro Takasu;Kenro Aihara

  • Affiliations:
  • National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan

  • Venue:
  • Proceedings of the 2008 ACM symposium on Applied computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a stochastic context-free grammar for extracting information from scanned document images. The grammar is designed to disambiguate layout analysis and utilize both layout and text features. We applied this grammar to the problem of extracting bibliographic information from scanned academic papers and found that it can accurately extract information.