Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Active shape models—their training and application
Computer Vision and Image Understanding
Algorithms for bigram and trigram word clustering
Speech Communication
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving language model size reduction using better pruning criteria
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An empirical study on language model adaptation
ACM Transactions on Asian Language Information Processing (TALIP)
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text
International Journal on Document Analysis and Recognition
Normalization-Cooperated Gradient Feature Extraction for Handwritten Character Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
A hybrid post-processing system for offline handwritten Chinese script recognition
Pattern Analysis & Applications
Implicitly supervised language model adaptation for meeting transcription
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Integrating Language Model in Handwritten Chinese Text Recognition
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
IEEE Transactions on Knowledge and Data Engineering
Faster and smaller N-gram language models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
CASIA Online and Offline Chinese Handwriting Databases
ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
An Iterative Relative Entropy Minimization-Based Data Selection Approach for n-Gram Model Adaptation
IEEE Transactions on Audio, Speech, and Language Processing
Improving Handwritten Chinese Text Recognition by Unsupervised Language Model Adaptation
DAS '12 Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems
Improving Book OCR by Adaptive Language and Image Models
DAS '12 Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems
Handwritten Chinese Text Recognition by Integrating Multiple Contexts
IEEE Transactions on Pattern Analysis and Machine Intelligence
Use of contexts in language model interpolation and adaptation
Computer Speech and Language
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Hybrid Language Model for Handwritten Chinese Sentence Recognition
ICFHR '12 Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition
Hi-index | 0.01 |
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy.