Calculating Similarity Efficiently in a Small World

Authors:
Xu Jia;Yuanzhe Cai;Hongyan Liu;Jun He;Xiaoyong Du
Affiliations:
Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing and Department of Computer Science, Renmin University of China, Beijing 100872;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing and Department of Computer Science, Renmin University of China, Beijing 100872;Department of Management Science and Engineering, Tsinghua University, Beijing 100084;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing and Department of Computer Science, Renmin University of China, Beijing 100872;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing and Department of Computer Science, Renmin University of China, Beijing 100872
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 6
Cited 3

Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The Small World Web

ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
SimFusion: measuring similarity using unified relationship matrix

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

A fast two-stage algorithm for computing SimRank and its extensions

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
SimRate: improve collaborative recommendation based on rating graph for sparsity

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Scalable and axiomatic ranking of network role similarity

ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue

Quantified Score

Hi-index	0.00

Visualization

Abstract

SimRank is a well-known algorithm for similarity calculation based on link analysis. However, it suffers from high computational cost. It has been shown that the world web graph is a "small world graph". In this paper, we observe that for this kind of small world graph, node pairs whose similarity scores are zero after first several iterations will remain zero in the final output. Based on this observation, we proposed a novel algorithm calledSW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs' similarity scores. Our experimental results on web datasets showed the efficiency of our approach. The larger the proportion of unreachable pairs is in the relationship graph, the more improvement the SW-SimRank algorithm will achieve. In addition, SW-SimRank can be integrated with other SimRank acceleration methods.