MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching

Authors:
Zhenjiang Lin;Michael R. Lyu;Irwin King
Affiliations:
The Chinese University of Hong Kong, Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong, Hong Kong
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 12
Cited 5

Algorithms for clustering data

Algorithms for clustering data
Automatic text processing

Automatic text processing
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Modern Information Retrieval

Modern Information Retrieval
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Node similarity in networked information spaces

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
The link prediction problem for social networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A new suffix tree similarity measure for document clustering

Proceedings of the 16th international conference on World Wide Web
PageSim: A Novel Link-Based Similarity Measure for the World Wide Web

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
TF-IDF uncovered: a study of theories and probabilities

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Joke retrieval: recognizing the same joke told differently

Proceedings of the 17th ACM conference on Information and knowledge management

Enhancing link-based similarity through the use of non-numerical labels and prior information

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
SimRate: improve collaborative recommendation based on rating graph for sparsity

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Axiomatic ranking of network role similarity

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to social computing

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Scalable and axiomatic ranking of network role similarity

ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of measuring similarity between web pages arises in many important Web applications, such as search engines and Web directories. In this paper, we propose a novel neighbor-based similarity measure called MatchSim, which uses only the neighborhood structure of web pages. Technically, MatchSim recursively defines similarity between web pages by the average similarity of the maximum matching between their neighbors. Our method extends the traditional methods which simply count the numbers of common and/or different neighbors. It also successfully overcomes a severe counterintuitive loophole in SimRank, due to its strict consistency with the intuitions of similarity. We give the computational complexity of MatchSim iteration. The accuracy of MatchSim is compared against others on two real datasets. The results show that our method performs best in most cases.