Automated layout recognition

  • Authors:
  • Lynn Golebiowski

  • Affiliations:
  • Booz, Allen and Hamilton, Annapolis Junction, Md

  • Venue:
  • Proceedings of the 1st ACM workshop on Hardcopy document processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

To develop document image layout classifiers, each document image is represented by a set of labeled polygons corresponding to the pair-wise relationships between objects on the page. "Wanted" and "Unwanted" training sets are used to generate a polygon weight based on frequency of occurrence in both sets (term frequency). Unknown documents are scored by comparing polygons to those occurring in the wanted set. A score, weighted by the term frequency for the matching polygons, is computed. Experiments are performed against the NIST Structured Forms Database based on single and multiple layout collections using a variety of training samples.