Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Statistical methods for speech recognition
Statistical methods for speech recognition
An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Discovering Chinese words from unsegmented text (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Self-Supervised Chinese Word Segmentation
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
USe: A Retargetable Word Segmentation Procedure for Information Retrieval
USe: A Retargetable Word Segmentation Procedure for Information Retrieval
Unsupervised language acquisition
Unsupervised language acquisition
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
The interaction of knowledge sources in word sense disambiguation
Computational Linguistics
A statistical model for word discovery in transcribed speech
Computational Linguistics
Critical tokenization and its properties
Computational Linguistics
A trainable rule-based algorithm for word segmentation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Chinese segmentation disambiguation
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Word identification for Mandarin Chinese sentences
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Tokenization as the initial phase in NLP
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Chinese lexical analysis using hierarchical hidden Markov model
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Integrating ngram model and case-based learning for Chinese word segmentation
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Reordering: a stepping-stone to perfect Thai Sign generation
CI '07 Proceedings of the Third IASTED International Conference on Computational Intelligence
Experience mining Google's production console logs
SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
An example-based study on chinese word segmentation using critical fragments
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Hi-index | 0.00 |
Just like other NLP applications, a serious problem with Chinese word segmentation lies in the ambiguities involved. Disambiguation methods fall into different categories, e.g., rule-based, statistical-based and example-based approaches, each of which may involve a variety of machine learning techniques. In this paper we report our current progress within the example-based approach, including its framework, example representation and collection, example matching and application. Experimental results show that this effective approach resolves more than 90% of ambiguities found. Hence, if it is integrated effectively with a segmentation method of the precision P 95%, the resulting segmentation accuracy can reach, theoretically, beyond 99.5%.