Document Processing for Automatic Knowledge Acquisition
IEEE Transactions on Knowledge and Data Engineering
Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
Parameter-Free Geometric Document Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Consensus-Based Table Form Recognition
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Locating Bookbacks in a Bookrack Image
ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Visual signature based identification of Low-resolution document images
Proceedings of the 2004 ACM symposium on Document engineering
Page Segmentation for Manhattan and Non-Manhattan Layout Documents via Selective CRLA
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Visual similarity based document layout analysis
Journal of Computer Science and Technology - Special section on China AVS standard
Text area detection in digital documents images using textural features
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Markov logic networks for document layout correction
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Text line segmentation for gray scale historical document images
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Text line extraction for historical document images
Pattern Recognition Letters
Hi-index | 0.14 |
This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLiDE (Chemical Literature Data Extraction) system (http://chem.leeds.ac.uk/ICAMS/CLiDE.html), but the method described here is suitable for a broader range of documents. It is based on Kruskal's algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression.