Induction of semantic classes from natural language text
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic construction of a hypernym-labeled noun hierarchy from text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Knowledge extraction for identification of Chinese organization names
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Boosting automatic lexical acquisition with morphological information
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Design of Chinese morphological analyzer
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Extending a thesaurus with words from Pan-Chinese sources
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Combining contextual and structural information for supersense tagging of chinese unknown words
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Hi-index | 0.00 |
This paper describes a classifier that assigns semantic thesaurus categories to unknown Chinese words (words not already in the CiLin thesaurus and the Chinese Electronic Dictionary, but in the Sinica Corpus). The focus of the paper differs in two ways from previous research in this particular area.Prior research in Chinese unknown words mostly focused on proper nouns (Lee 1993, Lee, Lee and Chen 1994, Huang, Hong and Chen 1994, Chen and Chen 2000). This paper does not address proper nouns, focusing rather on common nouns, adjectives, and verbs. My analysis of the Sinica Corpus shows that contrary to expectation, most of unknown words in Chinese are common nouns, adjectives, and verbs rather than proper nouns. Other previous research has focused on features related to unknown word contexts (Caraballo 1999; Roark and Charniak 1998). While context is clearly an important feature, this paper focuses on non-contextual features, which may play a key role for unknown words that occur only once and hence have limited context. The feature I focus on, following Ciaramita (2002), is morphological similarity to words whose semantic category is known. My nearest neighbor approach to lexical acquisition computes the distance between an unknown word and examples from the CiLin thesaurus based upon its morphological structure. The classifier improves on baseline semantic categorization performance for adjectives and verbs, but not for nouns.