Semantic similarity measurement using historical google search patterns

Authors:
Jorge Martinez-Gil;José F. Aldana-Montes
Affiliations:
Department of Computer Science, University of Malaga, Malaga, Spain;Department of Computer Science, University of Malaga, Malaga, Spain
Venue:
Information Systems Frontiers
Year:
2013

Citing 29
Cited 1

Information in data: using the Oxford English dictionary on a computer

ACM SIGIR Forum
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Collective Intelligence: It's All in the Numbers

IEEE Intelligent Systems
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Mining for personal name aliases on the web

Proceedings of the 17th international conference on World Wide Web
A relational–XML data warehouse for data aggregation with SQL and XQuery

Software—Practice & Experience
The relation between Pearson's correlation coefficient r and Salton's cosine measure

Journal of the American Society for Information Science and Technology
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
A semantic similarity metric combining features and intrinsic information content

Data & Knowledge Engineering
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Semantics and knowledge organization

Annual Review of Information Science and Technology
HAMSTER: using search clicklogs for schema and taxonomy matching

Proceedings of the VLDB Endowment
Towards an Increase of Collective Intelligence within Organizations Using Trust and Reputation Models

ICCCI '09 Proceedings of the 1st International Conference on Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Ontology-driven web-based semantic similarity

Journal of Intelligent Information Systems
Using Relational Similarity between Word Pairs for Latent Relational Search on the Web

WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Evaluation of entity resolution approaches on real-world match problems

Proceedings of the VLDB Endowment
Identity matching using personal and social identity features

Information Systems Frontiers
Improving predictions using aggregate information

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Manipulation of online reviews: An analysis of ratings, readability, and sentiments

Decision Support Systems
MedSearch: a retrieval system for medical information based on semantic similarity

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Revealing statistical independence of two experimental data sets: an improvement on spearman’s algorithm

ICCSA'06 Proceedings of the 6th international conference on Computational Science and Its Applications - Volume Part I
Inter-organisational knowledge transfer in social networks: A definition of intermediate ties

Information Systems Frontiers
Automating the schema matching process for heterogeneous data warehouses

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Business Intelligence and the Web

Information Systems Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is an important challenge in the information integration field. The problem is that techniques for textual semantic similarity measurement often fail to deal with words not covered by synonym dictionaries. In this paper, we try to solve this problem by determining the semantic similarity for terms using the knowledge inherent in the search history logs from the Google search engine. To do this, we have designed and evaluated four algorithmic methods for measuring the semantic similarity between terms using their associated history search patterns. These algorithmic methods are: a) frequent co-occurrence of terms in search patterns, b) computation of the relationship between search patterns, c) outlier coincidence on search patterns, and d) forecasting comparisons. We have shown experimentally that some of these methods correlate well with respect to human judgment when evaluating general purpose benchmark datasets, and significantly outperform existing methods when evaluating datasets containing terms that do not usually appear in dictionaries.