Using web-search results to measure word-group similarity

  • Authors:
  • Ann Gledson;John Keane

  • Affiliations:
  • University of Manchester, Manchester, UK;University of Manchester, Manchester, UK

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantic relatedness between words is important to many NLP tasks, and numerous measures exist which use a variety of resources. Thus far, such work is confined to measuring similarity between two words (or two texts), and only a handful utilize the web as a corpus. This paper introduces a distributional similarity measure which uses internet search counts and also extends to calculating the similarity within word-groups. The evaluation results are encouraging: for word-pairs, the correlations with human judgments are comparable with state-of-the-art web-search page-count heuristics. When used to measure similarities within sets of 10 words, the results correlate highly (up to 0.8) with those expected. Relatively little comparison has been made between the results of different search-engines. Here, we compare experimental results from Google, Windows Live Search and Yahoo and find noticeable differences.