Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition

Authors:
Affiliations:
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 10

Segmentation of the Date in Entries of Historical Church Registers

Proceedings of the 24th DAGM Symposium on Pattern Recognition
Recognition of Cursive Roman Handwriting - Past, Present and Future

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Word Segmentation of Handwritten Dates in Historical Documents by Combining Semantic A-Priori-Knowledge with Local Features

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A Recognition and Verification Strategy for Handwritten Word Recognition

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A Scale Space Approach for Automatically Segmenting Words from Historical Handwritten Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Tree Structure forWord Extraction from Handwritten Text Lines

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Text line and word segmentation of handwritten documents

Pattern Recognition
Handwritten document image segmentation into text lines and words

Pattern Recognition
Handwritten Arabic text line segmentation using affinity propagation

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A line-based representation for matching words in historical manuscripts

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: In this paper we present a system for recognizing unconstrained English handwritten text based on a large vocabulary. We describe the three main components of the system, which are preprocessing, feature extraction and recognition. In the preprocessing phase the handwritten texts are first segmented into lines. Then each line of text is normalized with respect to of skew, slant, vertical position and width. After these steps, text lines are segmented into single words. For this purpose distances between connected components are measured. Using a threshold, the distances are divided into distances within a word and distances between different words. A line of text is segmented at positions where the distances are larger than the chosen threshold. From each image representing a single word, a sequence of features is extracted. These features are input to a recognition procedure which is based on hidden Markov models. To investigate the stability of the segmentation algorithm the threshold that separates intra- and inter-word distances from each other is varied. If the threshold is small many errors are caused by over-segmentation, while for large thresholds under-segmentation errors occur. The best segmentation performance is 95.56% correctly segmented words, tested on 541 text lines containing 3899 words. Given a correct segmentation rate of 95.56%, a recognition rate of 73.45% on the word level is achieved.