Neural-Based Classification of Blocks from Documents
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Machine Learning of Generalized Document Templates for Data Extraction
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A Statistical Learning Approach To Document Image Analysis
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
Automatic derivation of logical document structure from generic layout would enable the development of many highly flexible electronic document manipulation tools. This problem can be divided into the segmentation of text into pieces and the classification of these pieces as particular logical structures. This paper proposes an approach to the classification of logical document structures, according to their distance from predefined prototypes. The prototypes consider linguistic information minimally, thus relying minimally on the accuracy of OCR and decreasing language-dependence. Different classes of logical structures and the differences in the requisite information for classifying them are discussed. A prototype format is proposed, existing prototypes and a distance measurement are described, and performance results are provided.