Distribution-based semantic similarity of nouns

Authors:
Igor A. Bolshakov;Alexander Gelbukh
Affiliations:
Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico;Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico
Venue:
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Year:
2007

Citing 9
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Correcting real-word spelling errors by restoring lexical cohesion

Natural Language Engineering
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Measurements of lexico-syntactic cohesion by means of internet

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Two methods of evaluation of semantic similarity of nouns based on their modifier sets

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In our previous work we have proposed two methods for evaluating semantic similarity / dissimilarity of nouns based on their modifier sets registered in Oxford Collocation Dictionary for Student of English. In this paper we provide further details on the experimental support and discussion of these methods. Given two nouns, in the first method the similarity is measured by the relative size of the intersection of the sets of modifiers applicable to both of them. In the second method, the dissimilarity is measured by the difference between the mean values of cohesion between a noun and the two sets of modifiers: its own ones and those of the other noun in question. Here, the cohesion between words is measured via Web statistics for co-occurrences of words. The two proposed measures prove to be in approximately inverse dependency. Our experiments show that Web-based weighting (the second method) gives better results.