Using web-search results to measure word-group similarity

Authors:
Ann Gledson;John Keane
Affiliations:
University of Manchester, Manchester, UK;University of Manchester, Manchester, UK
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 20
Cited 5

Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
A method for word sense disambiguation of unrestricted text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Novel association measures using web search with double checking

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Googleology is Bad Science

Computational Linguistics
Unsupervised acquisition of predominant word senses

Computational Linguistics
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Measuring topic homogeneity and its application to dictionary-based word sense disambiguation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Search engine statistics beyond the n-gram: application to noun compound bracketing

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Measuring topic homogeneity and its application to dictionary-based word sense disambiguation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
The effect of corpus size on case frame acquisition for discourse analysis

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Source code identifier splitting using Yahoo image and web search engine

Proceedings of the First International Workshop on Software Mining
Document classification based on web search hit counts

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Document clustering based on web search hit counts

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic relatedness between words is important to many NLP tasks, and numerous measures exist which use a variety of resources. Thus far, such work is confined to measuring similarity between two words (or two texts), and only a handful utilize the web as a corpus. This paper introduces a distributional similarity measure which uses internet search counts and also extends to calculating the similarity within word-groups. The evaluation results are encouraging: for word-pairs, the correlations with human judgments are comparable with state-of-the-art web-search page-count heuristics. When used to measure similarities within sets of 10 words, the results correlate highly (up to 0.8) with those expected. Relatively little comparison has been made between the results of different search-engines. Here, we compare experimental results from Google, Windows Live Search and Yahoo and find noticeable differences.