Automated layout recognition

Authors:
Lynn Golebiowski
Affiliations:
Booz, Allen and Hamilton, Annapolis Junction, Md
Venue:
Proceedings of the 1st ACM workshop on Hardcopy document processing
Year:
2004

Citing 3
Cited 1

An Efficiently Computable Metric for Comparing Polygonal Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Document Image Layout Comparison and Classification

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition

Document Ranking by Layout Relevance

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

To develop document image layout classifiers, each document image is represented by a set of labeled polygons corresponding to the pair-wise relationships between objects on the page. "Wanted" and "Unwanted" training sets are used to generate a polygon weight based on frequency of occurrence in both sets (term frequency). Unknown documents are scored by comparing polygons to those occurring in the wanted set. A score, weighted by the term frequency for the matching polygons, is computed. Experiments are performed against the NIST Structured Forms Database based on single and multiple layout collections using a variety of training samples.