Statistically-enhanced new word identification in a rule-based Chinese system

Authors:
Andi Wu;Zixin Jiang
Affiliations:
Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA
Venue:
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Year:
2000

Citing 3
Cited 16

A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
Unsupervised language acquisition

Unsupervised language acquisition
Word identification for Mandarin Chinese sentences

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1

Dynamic lexical acquisition in Chinese sentence analysis

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Building a large-scale annotated Chinese corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics
Chinese word segmentation in MSR-NLP

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Adaptive Chinese word segmentation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A method for automatic POS guessing of Chinese unknown words

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Hybrid methods for POS guessing of Chinese unknown words

ACLstudent '05 Proceedings of the ACL Student Research Workshop
A Unified Character-Based Tagging Framework for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Incremental Chinese lexicon extraction with minimal resources on a domain-specific corpus

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Chinese new word identification: a latent discriminative model with global features

Journal of Computer Science and Technology - Special issue on natural language processing
Chinese unknown word identification using class-based LM

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
The use of SVM for chinese new word identification

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Revising word lattice using support vector machine for Chinese word segmentation

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Unknown Chinese word extraction based on variety of overlapping strings

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a mechanism of new word identification in Chinese text where probabilities are used to filter candidate character strings and to assign POS to the selected strings in a ruled-based system. This mechanism avoids the sparse data problem of pure statistical approaches and the over-generation problem of rule-based approaches. It improves parser coverage and provides a tool for the lexical acquisition of new words.