Positioning unknown words in a thesaurus by using information extracted from a corpus

Authors:
Naohiko Uramoto
Affiliations:
Tokyo Research Laboratory, Kanagawa-ken, Japan
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 4
Cited 5

Building a lexical domain map from text corpora

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A WordNet-based algorithm for word sense disambiguation

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Exogeneous and endogeneous approaches to semantic categorization of unknown technical terms

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Semi-automatic practical ontology construction by using a thesaurus, computational dictionaries, and large corpora

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
Automatic feature thesaurus enrichment: extracting generic terms from digital gazetteer

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Semantic Labeling of Data by Using the Web

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Automatic term categorization by extracting knowledge from the Web

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a method for positioning unknown words in an existing thesaurus by using word-to-word relationships with relation (case) markers extracted from a large corpus. A suitable area of the thesaurus for an unknown word is estimated by integrating the human intuition buried in the thesaurus and statistical data extracted from the corpus. To overcome the problem of data sparseness, distinguishing features of each node, called "viewpoints" are extracted automatically and used to calculate the similarity between the unknown word and a word in the thesaurus. The results of an experiment confirm the contribution of viewpoints to the positioning task.