Measurement of similarity between nouns

Authors:
Kenneth E. Harper
Affiliations:
The RAND Corporation, Santa Monica, California
Venue:
COLING '65 Proceedings of the 1965 conference on Computational linguistics
Year:
1965

Citing 0
Cited 2

Probabilistic models of similarity in syntactic context

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Distributional techniques for philosophical enquiry

LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

Quantified Score

Hi-index	0.00

Visualization

Abstract

A study was made of the degree of similarity between pairs of Russian nouns, as expressed by their tendency to occur in sentences with identical words in identical syntactic relationships. A similarity matrix was prepared for forty nouns; for each pair of nouns the number of shared (i) adjective dependents, (ii) noun dependents, and (iii) noun governors was automatically retrieved from machine-processed text. The similarity coefficient for each pair was determined as the ratio of the total of such shared words to the product of the frequencies of the two nouns in the text. The 780 pairs were ranked according to this coefficient. The text comprised 120,000 running words of physics text processed at The RAND Corporation; the frequencies of occurrence of the forty nouns in this text ranged from 42 to 328.The results suggest that the sample of text is of sufficient size to be useful for the intended purpose. Many noun pairs with similar properties (synonymy, antonymy, derivation from distributionally similar verbs, etc.) are characterized by high similarity coefficients; the converse is not observed. The relevance of various syntactic relationships as criteria for measurement is discussed.