On Segmentation of Documents in Complex Scripts

Authors:
K. S. Kumar;S. Kumar;C. Jawahar
Affiliations:
International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India
Venue:
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Year:
2007

Citing 0
Cited 3

Automatic localization of page segmentation errors

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Automatic localization and correction of line segmentation errors

Proceeding of the workshop on Document Analysis and Recognition
Multilingual OCR research and applications: an overview

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document image segmentation algorithms primarily aim at separating text and graphics in presence of complex lay- outs. However, for many non-Latin scripts, segmentation becomes a challenge due to the characteristics of the script. In this paper, we empirically demonstrate that successful al- gorithms for Latin scripts may not be very effective for Indic and complex scripts. We explain this based on the differ- ences in the spatial distribution of symbols in the scripts. We argue that the visual information used for segmenta- tion needs to be enhanced with other information like script models for accurate results.