An Adaptive Method for the Efficient Similarity Calculation

Authors:
Yuanzhe Cai;Hongyan Liu;Jun He;Xiaoyong Du;Xu Jia
Affiliations:
Key Labs of Data Engineering and Knowledge Engineering, MOE, P.R. China and School of Information, Renmin University of China, P.R.China;Department of Management Science and Engineering, Tsinghua University, P.R. China;Key Labs of Data Engineering and Knowledge Engineering, MOE, P.R. China and School of Information, Renmin University of China, P.R.China;Key Labs of Data Engineering and Knowledge Engineering, MOE, P.R. China and School of Information, Renmin University of China, P.R.China;Key Labs of Data Engineering and Knowledge Engineering, MOE, P.R. China and School of Information, Renmin University of China, P.R.China
Venue:
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Year:
2009

Citing 4
Cited 1

Data mining: concepts and techniques

Data mining: concepts and techniques
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

A fast two-stage algorithm for computing SimRank and its extensions

WAIM'10 Proceedings of the 2010 international conference on Web-age information management

Quantified Score

Hi-index	0.00

Visualization

Abstract

SimRank is a well-known algorithm for similarity calculation based on object-to-object relationship. However, it suffers from high computation cost. In this paper, we find that the convergence behavior of different object pairs is different when we use SimRank to compute the similarity of objects. Many similarity scores converge fast, while others need more time before convergence. Based on this observation, we propose an adaptive method called Adaptive-SimRank to speed up similarity calculation. Using this method, we don't need to recalculate those converged pairs' similarity. The experiments conducted on web datasets and synthetic dataset show that our new method can reduce the running time by nearly 35%.