Adaptive document block segmentation and classification

Authors:
F. Y. Shih;Shy-Shyan Chen
Affiliations:
Comput. Vision Lab., New Jersey Inst. of Technol., Newark, NJ;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
1996

Citing 0
Cited 11

Symbolic Learning Techniques in Paper Document Processing

MLDM '99 Proceedings of the First International Workshop on Machine Learning and Data Mining in Pattern Recognition
Document Skew Detection Using Minimum-Area Bounding Rectangle

ITCC '00 Proceedings of the The International Conference on Information Technology: Coding and Computing (ITCC'00)
Artificial Neural Networks for Document Analysis and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Page Segmentation for Manhattan and Non-Manhattan Layout Documents via Selective CRLA

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Retrieval of machine-printed Latin documents through Word Shape Coding

Pattern Recognition
Retrieval of machine-printed Latin documents through Word Shape Coding

Pattern Recognition
A multi-plane approach for text segmentation of complex document images

Pattern Recognition
Performance comparison of RBF networks and MLPs for classification

AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
The effect of training set size for the performance of neural networks of classification

WSEAS Transactions on Computers
Property of artificial neural networks of classification with respect to training set size

ICS'10 Proceedings of the 14th WSEAS international conference on Systems: part of the 14th WSEAS CSCC multiconference - Volume II
Newspaper article reconstruction using ant colony optimization and bipartite graph

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an adaptive block segmentation and classification technique for daily-received office documents having complex layout structures such as multiple columns and mixed-mode contents of text, graphics, and pictures. First, an improved two-step block segmentation algorithm is performed based on run-length smoothing for decomposing any document into single-mode blocks. Then, a rule-based block classification is used for classifying each block into the text, horizontal/vertical line, graphics, or-picture type. The document features and rules used are independent of character font and size and the scanning resolution. Experimental results show that our algorithms are capable of correctly segmenting and classifying different types of mixed-mode printed documents