Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Two complementary techniques for digitized document analysis
DOCPROCS '88 Proceedings of the ACM conference on Document processing systems
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Algorithms for multilevel logic optimization
Algorithms for multilevel logic optimization
Regular expressions into finite automata
Theoretical Computer Science
Document image analysis
Error tolerant document structure analysis
IEEE ADL '97 Proceedings of the IEEE international forum on Research and technology advances in digital libraries
The XML handbook
Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
Geometric Structure Analysis of Document Images: A Knowledge-Based Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals
IEEE Transactions on Pattern Analysis and Machine Intelligence
Logical Structure Analysis of Book Document Images Using Contents Information
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
A DTD Extension for Document Structure Recognition
EP '98/RIDT '98 Proceedings of the 7th International Conference on Electronic Publishing, Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography: Electronic Publishing, Artistic Imaging, and Digital Typography
Distributed Knowledge-Based Parsing for Document Analysis and Understanding
ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
Analysis of Synthetic Document Images
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Automatic discovery of logical document structure
Automatic discovery of logical document structure
Geometric algorithms and experiments for automated document structuring
Mathematical and Computer Modelling: An International Journal
Semantics-enriched document exchange
Proceedings of the 10th ACM symposium on Document engineering
Structure extraction from PDF-based book documents
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Structure-preserving pipelines for digital libraries
LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Reengineering PDF-based documents targeting complex software specifications
International Journal of Knowledge and Web Intelligence
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
This paper presents a syntactic method for sophisticated logical structure analysis that transforms document images with multiple pages and hierarchical structure into an electronic document based on SGML/XML. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed parsing method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of documents efficiently and present its automated creation method. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) show that the method has performed logical structure analysis successfully and generated a document model automatically. Particularly, the method generates SGML/XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.