A lexicon-constrained character model for chinese morphological analysis

Authors:
Yao Meng;Hao Yu;Fumihito Nishino
Affiliations:
Fujitsu R&D Center Co., Ltd, Bejing, P. R. China;Fujitsu R&D Center Co., Ltd, Bejing, P. R. China;Fujitsu R&D Center Co., Ltd, Bejing, P. R. China
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 13
Cited 0

A Chinese dictionary construction algorithm for information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Building a large-scale annotated Chinese corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese unknown word identification using character-based tagging and chunking

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Two-character Chinese word extraction based on hybrid of internal and contextual measures

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
A Chinese efficient analyser integrating word segmentation, part-of-speech tagging, partial parsing and full parsing

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
The first international Chinese word segmentation Bakeoff

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
A two-stage statistical word segmentation system for Chinese

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation in MSR-NLP

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation as LMR tagging

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
HHMM-based Chinese lexical analyzer ICTCLAS

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
A maximum entropy Chinese character-based parser

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Adaptive Chinese word segmentation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Chinese and Japanese word segmentation using word-level and character-level information

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a lexicon-constrained character model that combines both word and character features to solve complicated issues in Chinese morphological analysis. A Chinese character-based model constrained by a lexicon is built to acquire word building rules. Each character in a Chinese sentence is assigned a tag by the proposed model. The word segmentation and part-of-speech tagging results are then generated based on the character tags. The proposed method solves such problems as unknown word identification, data sparseness, and estimation bias in an integrated, unified framework. Preliminary experiments indicate that the proposed method outperforms the best SIGHAN word segmentation systems in the open track on 3 out of the 4 test corpora. Additionally, our method can be conveniently integrated with any other Chinese morphological systems as a post-processing module leading to significant improvement in performance.