Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals
IEEE Transactions on Pattern Analysis and Machine Intelligence
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Near-wordless document structure classification
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Decomposing document images by heuristic search
EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Markov logic networks for document layout correction
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Automatic indexing of French handwritten census registers for probate geneaology
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Rule based document understanding of historical books using a hybrid fuzzy classification system
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
In the field of computer analysis of document images, the problems of physical and logical layout analysis have been approached through a variety of heuristic, rule-based, and grammar-based techniques. In this paper we investigate the effectiveness of statistical pattern recognition algorithms for solving these two problems, and report results suggesting that these more complex and powerful techniques are worth pursuing. First, we developed a new software environment for manual page image segmentation and labeling, and used it to create a dataset containing 932 page images from academic journals. Next, a physical layout analysis algorithm based on a logistic regression classifier was developed, and found to outperform existing algorithms of comparable complexity. Finally, three statistical classifiers were applied to the logical layout analysis problem, also with encouraging results.