Adaptive Word Style Classification Using a Gaussian Mixture Model

Authors:
Huanfeng Ma;David Doermann
Affiliations:
University of Maryland, College Park;University of Maryland, College Park
Venue:
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Year:
2004

Citing 0
Cited 1

Introducing a very large dataset of handwritten Farsi digits and a study on their varieties

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a new approach to detect bold and italic words in scanned documents. Under the assumption that OCR results are available, features used for classification are selected automatically using feature selection. For each scanned page, a Gaussian Mixture Model is constructed for characters with the same character code, and word styles are determined using a weighted majority vote. We applied this method to a variety of documents and compared the results with current commercial OCR software that provides style information. The experimental results show that our method performs better.