Identification and classification of proper nouns in Chinese texts

Authors:
Hsin-Hsi Chen;Jen-Chang Lee
Affiliations:
National Taiwan University, Taipei, Taiwan, R.O.C.;National Taiwan University, Taipei, Taiwan, R.O.C.
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Year:
1996

Citing 6
Cited 16

Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
A stochastic finite-state word-segmentation algorithm for Chinese

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A part-of-speech-based alignment algorithm

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Word identification for Mandarin Chinese sentences

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Recognizing unregistered names for Mandarin word identification

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
A logic-based Government-Binding parser for Mandarin Chinese

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2

Revision of Morphological Analysis Errors through the Person Name Construction Model

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
An NLP & IR approach to topic detection

Topic detection and tracking
Automatic thesaurus generation for Chinese documents

Journal of the American Society for Information Science and Technology
Applying repair processing in Chinese homophone disambiguation

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Proper name translation in cross-language information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A multilingual news summarizer

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Enhancing performance of protein and gene name recognizers with filtering and integration strategies

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
The head-modifier principle and multilingual term extraction

Natural Language Engineering
Knowledge extraction for identification of Chinese organization names

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Enhancing performance of protein name recognizers using collocation

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Learning formulation and transformation rules for multilingual named entities

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Cross-document event clustering using knowledge mining from co-reference chains

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)
Accelerating Web Content Filtering by the Early Decision Algorithm

IEICE - Transactions on Information and Systems
On building a full-text digital library of historical documents

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Various strategies are proposed to identify and classify three types of proper nouns in Chinese texts. Clues from character, sentence and paragraph levels are employed to resolve Chinese personal names. Character, Syllable and Frequency Conditions are presented to treat transliterated personal names. To deal with organization names, keywords, prefix, word association and parts-of-speech are applied. For fair evaluation, large scale test data are selected from six sections of a newspaper. The precision and the recall for these three types are (88.04%, 92.56%), (50.62%, 71.93%) and (61.79%, 54.50%), respectively. When the former two types are regarded as a category, the performance becomes (81.46%, 91.22%). Compared with other approaches, our approach has better performance and our classification is automatic.