Improved source-channel models for Chinese word segmentation

Authors:
Jianfeng Gao;Mu Li;Chang-Ning Huang
Affiliations:
Microsoft Research, Asia, Beijing, China;Microsoft Research, Asia, Beijing, China;Microsoft Research, Asia, Beijing, China
Venue:
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Year:
2003

Citing 8
Cited 19

Chinese text segmentation for text retrieval: achievements and problems

Journal of the American Society for Information Science
A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A study on word-based and integral-bit Chinese text compression algorithms

Journal of the American Society for Information Science
A new statistical formula for Chinese text segmentation incorporating contextual information

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
A compression-based algorithm for Chinese word segmentation

Computational Linguistics
Chinese named entity identification using class-based language model

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Using a web-based categorization approach to generate thematic metadata from texts

ACM Transactions on Asian Language Information Processing (TALIP)
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics
Single character Chinese named entity recognition

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Adaptive Chinese word segmentation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Resume information extraction with cascaded hybrid model

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative pruning of language models for Chinese word segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A joint statistical model for simultaneous word spacing and spelling error correction for Korean

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Generating Chinese couplets using a statistical MT approach

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
An empirical study on web mining of parallel data

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A character-based joint model for Chinese word segmentation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
An integrative Chinese lexical analyzer based on maximum matching and second-maximum matching segmentation

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Mining entity translations from comparable corpora: a holistic graph mapping approach

Proceedings of the 20th ACM international conference on Information and knowledge management
Chinese abbreviation identification using abbreviation-template features and context information

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Tagging complex NEs with maxent models: layered structures versus extended tagset

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
The use of SVM for chinese new word identification

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Strategies of processing Japanese names and variant characters in traditional Chinese text: [in Chinese]

ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a Chinese word segmentation system that uses improved source-channel models of Chinese sentence generation. Chinese words are defined as one of the following four types: lexicon words, morphologically derived words, factoids, and named entities. Our system provides a unified approach to the four fundamental features of word-level Chinese language processing: (1) word segmentation, (2) morphological analysis, (3) factoid detection, and (4) named entity recognition. The performance of the system is evaluated on a manually annotated test set, and is also compared with several state-of-the-art systems, taking into account the fact that the definition of Chinese words often varies from system to system.