Document Transformation System from Papers to XML Data Based on Pivot XML Document Method
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
Structure-preserving pipelines for digital libraries
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Hi-index | 0.00 |
A new method for logical structure analysis of document images is proposed in this paper as the basis for a document reader which can extract logical information from various printed documents. The proposed system consists of five basic modules: typography analysis, object recognition, object segmentation, object grouping, and object modification. Emergent computation, which is a key concept of artificial life, is adopted for the cooperative interaction among modules in the system in order to achieve effective and flexible behavior of the whole system. It has three principal advantages over other methods: adaptive system configuration for various and complex logical structures, robust document analysis tolerant of erroneous feature detection, and feedback of high-level logical information to the low-level physical process for accurate analysis.Experimental results obtained for 150 documents show that the method is adaptable, robust, and effective for various document structures.