A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Discovering Chinese words from unsegmented text (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Using self-supervised word segmentation in Chinese information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Applying Machine Learning to Text Segmentation for Information Retrieval
Information Retrieval
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences
Natural Language Engineering
The head-modifier principle and multilingual term extraction
Natural Language Engineering
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
A maximum entropy Chinese character-based parser
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Unsupervised query segmentation using generative language models and wikipedia
Proceedings of the 17th international conference on World Wide Web
Chinese Word Segmentation for Terrorism-Related Contents
PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Query segmentation based on eigenspace similarity
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Punctuation as implicit annotations for chinese word segmentation
Computational Linguistics
Expert Systems with Applications: An International Journal
Inducing Morphemes Using Light Knowledge
ACM Transactions on Asian Language Information Processing (TALIP)
Integrating unsupervised and supervised word segmentation: The role of goodness measures
Information Sciences: an International Journal
Domain-specific Chinese word segmentation using suffix tree and mutual information
Information Systems Frontiers
Unsupervised query segmentation using clickthrough for information retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A new unsupervised approach to word segmentation
Computational Linguistics
Unsupervised segmentation of chinese corpus using accessor variety
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Hi-index | 0.00 |
We propose a new unsupervised training method for acquiring probability models that accurately segment Chinese character sequences into words. By constructing a core lexicon to guide unsupervised word learning, self-supervised segmentation overcomes the local maxima problems that hamper standard EM training. Our procedure uses successive EM phases to learn a good probability model over character strings, and then prunes this model with a mutual information selection criterion to obtain a more accurate word lexicon. The segmentations produced by these models are more accurate than those produced by training with EM alone.