Document Classification and Interpretation through the Inference of Logic-Based Models
ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Making Documents Work: Challenges for Document Understanding
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Thick 2D relations for document understanding
Information Sciences—Informatics and Computer Science: An International Journal
Hi-index | 0.00 |
Document image processing begins from the "OCR" phase with difficulty of automatic document "analysis" and "understanding". Most existing systems only do well in their specific application domains. In this paper, we describe a domain-independent automatic document image understanding system with learning ability. A segmentation method based on the "logical closeness" is proposed. A novel and natural representation of document layout structure - directed weight graph (DWG) is described. To classify a given document, a string representation matching is applied first instead of comparing with all the sample graphs. Frame template and document type hierarchy (DTH) are used to represent document logical structure and the hierarchical relation among these frame templates respectively. In this paper, two methodologies of learning are applied -- learning from experience and enhanced perceptron learning algorithm.