Recognizing unregistered names for Mandarin word identification

Authors:
Liang-Jyh Wang;Wei-Chuan Li;Chao-Huang Chang
Affiliations:
Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.;Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.;Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, R.O.C.
Venue:
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Year:
1992

Citing 1
Cited 6

Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems

Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems

A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
Revision of Morphological Analysis Errors through the Person Name Construction Model

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
CSeg& Tag1.0: a practical word segmenter and POS tagger for Chinese texts

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A stochastic finite-state word-segmentation algorithm for Chinese

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An iterative algorithm to build Chinese language models

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Identification and classification of proper nouns in Chinese texts

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word Identification has been an important and active issue in Chinese Natural Language Processing. In this paper, a new mechanism, based on the concept of sublanguage, is proposed for identifying unknown words, especially personal names, in Chinese newspapers. The proposed mechanism includes title-driven name recognition, adaptive dynamic word formation, identification of 2-character and 3-character Chinese names without title. We will show the experimental results for two corpora and compare them with the results by the NTHU's statistic-based system, the only system that we know has attacked the same problem. The experimental results have shown significant improvements over the WI systems without the name identification capability.