Unsupervised language model adaptation for handwritten Chinese text recognition

Authors:
Qiu-Feng Wang;Fei Yin;Cheng-Lin Liu
Affiliations:
-;-;-
Venue:
Pattern Recognition
Year:
2014

Citing 26
Cited 0

Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Active shape models—their training and application

Computer Vision and Image Understanding
Algorithms for bigram and trigram word clustering

Speech Communication
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving language model size reduction using better pruning criteria

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An empirical study on language model adaptation

ACM Transactions on Asian Language Information Processing (TALIP)
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text

International Journal on Document Analysis and Recognition
Normalization-Cooperated Gradient Feature Extraction for Handwritten Character Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
A hybrid post-processing system for offline handwritten Chinese script recognition

Pattern Analysis & Applications
Forty years of research in character and document recognition-an industrial perspective

Pattern Recognition
Building compact MQDF classifier for large character set recognition by subspace distribution sharing

Pattern Recognition
Off-line recognition of realistic Chinese handwriting using segmentation-free strategy

Pattern Recognition
Implicitly supervised language model adaptation for meeting transcription

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Integrating Language Model in Handwritten Chinese Text Recognition

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Faster and smaller N-gram language models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
CASIA Online and Offline Chinese Handwriting Databases

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
An Iterative Relative Entropy Minimization-Based Data Selection Approach for n-Gram Model Adaptation

IEEE Transactions on Audio, Speech, and Language Processing
Improving Handwritten Chinese Text Recognition by Unsupervised Language Model Adaptation

DAS '12 Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems
Improving Book OCR by Adaptive Language and Image Models

DAS '12 Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems
Handwritten Chinese Text Recognition by Integrating Multiple Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Use of contexts in language model interpolation and adaptation

Computer Speech and Language
Whole-Book Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Hybrid Language Model for Handwritten Chinese Sentence Recognition

ICFHR '12 Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy.