Self-organized language modeling for speech recognition
Readings in speech recognition
Class-based n-gram models of natural language
Computational Linguistics
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Navigating the Information Superhighway Using Spoken Language Interfaces
IEEE Expert: Intelligent Systems and Their Applications
Language Model Adaptation Using Mixtures and an Exponentially Decaying Cache
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Chinese word segmentation based on maximum matching and word binding force
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Extraction of Chinese compound words: an experimental study on a very large corpus
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Multi-class composite N-gram based on connection direction
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Introduction to the special issue on statistical language modeling
ACM Transactions on Asian Language Information Processing (TALIP)
Chinese named entity identification using class-based language model
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Improving language model size reduction using better pruning criteria
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Exploring asymmetric clustering for statistical language modeling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improved source-channel models for Chinese word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised learning of dependency structure for language modeling
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Exploiting headword dependency and predictive clustering for language modeling
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Finding the better indexing units for Chinese information retrieval
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Unsupervised training for overlapping ambiguity resolution in Chinese word segmentation
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
An empirical study on language model adaptation
ACM Transactions on Asian Language Information Processing (TALIP)
A comparative study on language model adaptation techniques using new evaluation metrics
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Statistical query translation models for cross-language information retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
Using word support model to improve Chinese input system
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
State-dependent phoneme-based model merging for dialectal Chinese speech recognition
Speech Communication
Structural optimization of a full-text n-gram index using relational normalization
The VLDB Journal — The International Journal on Very Large Data Bases
A novel statistical chinese language model and its application in pinyin-to-character conversion
Proceedings of the 17th ACM conference on Information and knowledge management
Perplexity-based evidential neural network classifier fusion using mpeg-7 low-level visual features
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
An artificial immune network approach for pinyin-to- character conversion
VECIMS'09 Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems
Intelligent selection of language model training data
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Predicting word pronunciation in Japanese
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
The use of SVM for chinese new word identification
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
State-dependent phoneme-based model merging for dialectal chinese speech recognition
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Spoken correction for chinese text entry
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Domain adaptation via pseudo in-domain data selection
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An empirical study on language model adaptation using a metric of domain similarity
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
CHIME: an efficient error-tolerant Cinese pinyin input method
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Adapting translation models to translationese improves SMT
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
A unified approach to transliteration-based text input with online spelling correction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
LIUM's SMT machine translation systems for WMT 2012
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
ACM Transactions on Asian Language Information Processing (TALIP)
Class-Based language models for chinese-english parallel corpus
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Improving statistical machine translation by adapting translation models to translationese
Computational Linguistics
Improving statistical machine translation by adapting translation models to translationese
Computational Linguistics
Hi-index | 0.00 |
This article presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) there is no standard definition of words in Chinese; (2) word boundaries are not marked by spaces; and (3) there is a dearth of training data. Our unified approach automatically and consistently gathers a high-quality training data set from the Web, creates a high-quality lexicon, segments the training data using this lexicon, and compresses the language model, all by using the maximum likelihood principle, which is consistent with trigram model training. We show that each of the methods leads to improvements over standard SLM, and that the combined method yields the best pinyin conversion result reported.