Word Segmentation in Handwritten Korean Text Lines Based on Gap Clustering Techniques

Authors:
Affiliations:
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 6

Segmentation of the Date in Entries of Historical Church Registers

Proceedings of the 24th DAGM Symposium on Pattern Recognition
Word Segmentation of Handwritten Dates in Historical Documents by Combining Semantic A-Priori-Knowledge with Local Features

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Tree Structure forWord Extraction from Handwritten Text Lines

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Word Separation of Unconstrained Handwritten Text Lines in PCR Forms

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Handwritten word-spotting using hidden Markov models and universal vocabularies

Pattern Recognition
Text line and word segmentation of handwritten documents

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: We propose a word segmentation method for handwritten Korean text lines. It uses gap information to separate a text line into word units, where the gap is defined as a white-run obtained after a vertical projection of the line image. Each gap is classified into a between-word gap or a within-word gap using a clustering technique. We take up three gap metrics - BB, RLE and CH which are known to have superior performance in Roman-style word segmentation, and three clustering techniques - average linkage method, modified MAX method and sequential clustering. An experiment with 498 text line images extracted from live mail pieces has shown that the best performance is obtained by the sequential clustering technique using all three gap metrics.