An Efficiently Computable Metric for Comparing Polygonal Shapes
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document Image Layout Comparison and Classification
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Document Ranking by Layout Relevance
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
To develop document image layout classifiers, each document image is represented by a set of labeled polygons corresponding to the pair-wise relationships between objects on the page. "Wanted" and "Unwanted" training sets are used to generate a polygon weight based on frequency of occurrence in both sets (term frequency). Unknown documents are scored by comparing polygons to those occurring in the wanted set. A score, weighted by the term frequency for the matching polygons, is computed. Experiments are performed against the NIST Structured Forms Database based on single and multiple layout collections using a variety of training samples.