A Statistically Based, Highly Accurate Text-Line Segmentation Method

Authors:
Jisheng Liang;Robert M. Haralick;Ihsin T. Phillips
Affiliations:
-;-;-
Venue:
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Year:
1999

Citing 0
Cited 6

An Approach to Extracting the Target Text Line from a Document Image Captured by a Pen Scanner

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Segmentation of Bangla Unconstrained Handwritten Text

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Text - Image Separation in Devanagari Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Segmentation and analysis of handwritten scripts from patients with neurological diseases

CompSysTech '03 Proceedings of the 4th international conference conference on Computer systems and technologies: e-Learning
Structuralizing digital ink for efficient selection

Proceedings of the 11th international conference on Intelligent user interfaces
Model-Guided Segmentation and Layout Labelling of Document Images Using a Hierarchical Conditional Random Field

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a text-line identification and segmentation technique that is probability based, where all probabilities are estimated from an extensive training set of various kind of measurements of distances between the terminal and non-terminal entities and between the text-line and the text-block entities with which the algorithm works. The off-line probabilities estimated in the training then drive all decisions in the on-line segmentation algorithm. On the UW-III database of some 1600 scanned document image pages, having some 105,020 text lines, the algorithm identifies and segments 104,773 correctly, an accuracy of 99.76%.