A Fast Algorithm for Bottom-Up Document Layout Analysis

Authors:
Anikó Simon;Jean-Christophe Pret;A. Peter Johnson
Affiliations:
-;-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
1997

Citing 3
Cited 14

Document Processing for Automatic Knowledge Acquisition

IEEE Transactions on Knowledge and Data Engineering
Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Document Spectrum for Page Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence

Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
Parameter-Free Geometric Document Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Consensus-Based Table Form Recognition

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Locating Bookbacks in a Bookrack Image

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Visual signature based identification of Low-resolution document images

Proceedings of the 2004 ACM symposium on Document engineering
Page Segmentation for Manhattan and Non-Manhattan Layout Documents via Selective CRLA

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Visual similarity based document layout analysis

Journal of Computer Science and Technology - Special section on China AVS standard
A machine-learning approach for analyzing document layout structures with two reading orders

Pattern Recognition
Handwritten Chinese text line segmentation by clustering with distance metric learning

Pattern Recognition
Text area detection in digital documents images using textural features

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
A histogram-based technique for automatic threshold assessment in a run length smoothing-based algorithm

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Markov logic networks for document layout correction

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Text line segmentation for gray scale historical document images

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Text line extraction for historical document images

Pattern Recognition Letters

Quantified Score

Hi-index	0.14

Visualization

Abstract

This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLiDE (Chemical Literature Data Extraction) system (http://chem.leeds.ac.uk/ICAMS/CLiDE.html), but the method described here is suitable for a broader range of documents. It is based on Kruskal's algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression.