Automatic localization and correction of line segmentation errors

Authors:
Anand Mishra;Naveen Sankaran;Viresh Ranjan;C. V. Jawahar
Affiliations:
IIIT Hyderabad, India;IIIT Hyderabad, India;IIIT Hyderabad, India;IIIT Hyderabad, India
Venue:
Proceeding of the workshop on Document Analysis and Recognition
Year:
2012

Citing 14
Cited 0

A Prototype Document Image Analysis System for Technical Journals

Computer
Segmentation of page images using the area Voronoi diagram

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
The Document Spectrum for Page Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Page Segmentation Competition

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
On Segmentation of Documents in Complex Scripts

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Page Segmentation Competition

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Content-level Annotation of Large Collection of Printed Document Images

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Image segmentation evaluation: A survey of unsupervised methods

Computer Vision and Image Understanding
Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Voronoi++: A Dynamic Page Segmentation Approach Based on Voronoi and Docstrum Features

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
ICDAR 2009 Page Segmentation Competition

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Document analysis system

IBM Journal of Research and Development
Automatic localization of page segmentation errors

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Fringe Map Based Text Line Segmentation of Printed Telugu Document Images

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text line segmentation is a basic step in any OCR system. Its failure deteriorates the performance of OCR engines. This is especially true for the Indian languages due to the nature of scripts. Many segmentation algorithms are proposed in literature. Often these algorithms fail to adapt dynamically to a given page and thus tend to yield poor segmentation for some specific regions or some specific pages. In this work we design a text line segmentation post processor which automatically localizes and corrects the segmentation errors. The proposed segmentation post processor, which works in a "learning by examples" framework, is not only independent to segmentation algorithms but also robust to the diversity of scanned pages. We show over 5% improvement in text line segmentation on a large dataset of scanned pages for multiple Indian languages.