Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A study on word-based and integral-bit Chinese text compression algorithms
Journal of the American Society for Information Science
A new statistical formula for Chinese text segmentation incorporating contextual information
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Toward a unified approach to statistical language modeling for Chinese
ACM Transactions on Asian Language Information Processing (TALIP)
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
Chinese named entity identification using class-based language model
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using a web-based categorization approach to generate thematic metadata from texts
ACM Transactions on Asian Language Information Processing (TALIP)
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Single character Chinese named entity recognition
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Adaptive Chinese word segmentation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Resume information extraction with cascaded hybrid model
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative pruning of language models for Chinese word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A joint statistical model for simultaneous word spacing and spelling error correction for Korean
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Generating Chinese couplets using a statistical MT approach
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
An empirical study on web mining of parallel data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A character-based joint model for Chinese word segmentation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Mining entity translations from comparable corpora: a holistic graph mapping approach
Proceedings of the 20th ACM international conference on Information and knowledge management
Chinese abbreviation identification using abbreviation-template features and context information
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Tagging complex NEs with maxent models: layered structures versus extended tagset
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
The use of SVM for chinese new word identification
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
This paper presents a Chinese word segmentation system that uses improved source-channel models of Chinese sentence generation. Chinese words are defined as one of the following four types: lexicon words, morphologically derived words, factoids, and named entities. Our system provides a unified approach to the four fundamental features of word-level Chinese language processing: (1) word segmentation, (2) morphological analysis, (3) factoid detection, and (4) named entity recognition. The performance of the system is evaluated on a manually annotated test set, and is also compared with several state-of-the-art systems, taking into account the fact that the definition of Chinese words often varies from system to system.