A review of segmentation and contextual analysis techniques for text recognition
Pattern Recognition
Self-organized language modeling for speech recognition
Readings in speech recognition
C4.5: programs for machine learning
C4.5: programs for machine learning
Class-based n-gram models of natural language
Computational Linguistics
The nature of statistical learning theory
The nature of statistical learning theory
The Normalized String Editing Problem Revisited
IEEE Transactions on Pattern Analysis and Machine Intelligence
The power of amnesia: learning probabilistic automata with variable memory length
Machine Learning - Special issue on COLT '94
The String-to-String Correction Problem
Journal of the ACM (JACM)
Information Retrieval
A language model based on semantically clustered words in a Chinese character recognition system
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
N-th order Ergodic Multigram HMM for modeling of languages without marked word boundaries
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A class based language model for speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Improved topic-dependent language modeling using information retrieval techniques
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Postprocessing statistical language models for handwritten Chinesecharacter recognizer
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Discovering "title-like" terms
Information Processing and Management: an International Journal
Hi-index | 0.00 |
The bigram language models are popular, in much language processing applications, in both Indo-European and Asian languages. However, when the language model for Chinese is applied in a novel domain, the accuracy is reduced significantly, from 96% to 78% in our evaluation. We apply pattern recognition techniques (i.e. Bayesian, decision tree and neural network classifiers) to discover language model errors. We have examined 2 general types of features: model-based and language-specific features. In our evaluation, Bayesian classifiers produce the best recall performance of 80% but the precision is low (60%). Neural network produced good recall (75%) and precision (80%) but both Bayesian and Neural network have low skip ratio (65%). The decision tree classifier produced the best precision (81%) and skip ratio (76%) but its recall is the lowest (73%).