A fast two-stage algorithm for computing SimRank and its extensions

Authors:
Xu Jia;Hongyan Liu;Li Zou;Jun He;Xiaoyong Du
Affiliations:
Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Department of Management Science and Engineering, Tsinghua University, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China;Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China and Department of Computer Science, Renmin University of China, China
Venue:
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Year:
2010

Citing 18
Cited 0

Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching

SIAM Review
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
SimFusion: measuring similarity using unified relationship matrix

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
PageSim: a novel link-based measure of web page aimilarity

Proceedings of the 15th international conference on World Wide Web
LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fast Random Walk with Restart and Its Applications

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Simrank++: query rewriting through link analysis of the click graph

Proceedings of the VLDB Endowment
Accuracy estimate and optimization techniques for SimRank computation

Proceedings of the VLDB Endowment
An Adaptive Method for the Efficient Similarity Calculation

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Exploiting the Block Structure of Link Graph for Efficient Similarity Computation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Calculating Similarity Efficiently in a Small World

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
P-Rank: a comprehensive structural similarity measure over information networks

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient Algorithm for Computing Link-Based Similarity in Real World Networks

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity estimation can be used in many applications such as recommender system, cluster analysis, information retrieval and link prediction. SimRank is a famous algorithm to measure objects' similarities based on link structure. We observe that if one node has no in-link, similarity score between this node and any of the others is always zero. Based on this observation, we propose a new algorithm, fast two-stage SimRank (F2S-SimRank), which can avoid storing unnecessary zeros and can accelerate the computation without accuracy loss. Under the circumstance of no accuracy loss, this algorithm uses less computation time and occupies less main memory. Experiments conducted on real and synthetic datasets demonstrate the effectiveness and efficiency of our F2S-SimRank.