Page Segmentation for Manhattan and Non-Manhattan Layout Documents via Selective CRLA

Authors:
Hung-Ming Sun
Affiliations:
Kainan University, Taoyuan, Taiwan, R.O.C.
Venue:
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Year:
2005

Citing 10
Cited 3

A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text segmentation using Gabor filters for automatic document processing

Machine Vision and Applications - Special issue: document image analysis techniques
Page segmentation and classification

CVGIP: Graphical Models and Image Processing
A Fast Algorithm for Bottom-Up Document Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
Digital Image Processing

Digital Image Processing
The Document Spectrum for Page Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration

IEEE Transactions on Pattern Analysis and Machine Intelligence
Arabic Newspaper Page Segmentation

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Adaptive document block segmentation and classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

The Diagonal Split: A Pre-segmentation Step for Page Layout Analysis and Classification

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
A histogram-based technique for automatic threshold assessment in a run length smoothing-based algorithm

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A kernel-based approach to document retrieval

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Constrained Run-Length Algorithm (CRLA) is a well-known technique for page segmentation. The algorithm is fast and can be used to partition documents with Manhattan layouts. It is not, however, suited to deal with pages with layouts beyond the Manhattan format, e.g. irregular halftone images embedded in text paragraphs. A modified version of the CRLA, named selective CRLA, is presented in this paper. The selective CRLA is capable of processing documents with both Manhattan and non-Manhattan layouts. The selective CRLA is performed twice with different sets of parameters on a label image derived from the input document image. After both of its executions, the yielded text regions are extracted. The proposed method has been successfully applied to extraction of text from commercial magazine pages with complicated layouts.