Calculating Similarity Efficiently in a Small World

  • Authors:
  • Xu Jia;Yuanzhe Cai;Hongyan Liu;Jun He;Xiaoyong Du

  • Affiliations:
  • Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing and Department of Computer Science, Renmin University of China, Beijing 100872;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing and Department of Computer Science, Renmin University of China, Beijing 100872;Department of Management Science and Engineering, Tsinghua University, Beijing 100084;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing and Department of Computer Science, Renmin University of China, Beijing 100872;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing and Department of Computer Science, Renmin University of China, Beijing 100872

  • Venue:
  • ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

SimRank is a well-known algorithm for similarity calculation based on link analysis. However, it suffers from high computational cost. It has been shown that the world web graph is a "small world graph". In this paper, we observe that for this kind of small world graph, node pairs whose similarity scores are zero after first several iterations will remain zero in the final output. Based on this observation, we proposed a novel algorithm calledSW-SimRank to speed up similarity calculation by avoiding recalculating those unreachable pairs' similarity scores. Our experimental results on web datasets showed the efficiency of our approach. The larger the proportion of unreachable pairs is in the relationship graph, the more improvement the SW-SimRank algorithm will achieve. In addition, SW-SimRank can be integrated with other SimRank acceleration methods.