Delta-SimRank computing on MapReduce

Authors:
Liangliang Cao;Brian Cho;Hyun Duk Kim;Zhen Li;Min-Hsuan Tsai;Indranil Gupta
Affiliations:
IBM Watson Research Center;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Year:
2012

Citing 18
Cited 0

SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A uniform approach to accelerated PageRank computation

WWW '05 Proceedings of the 14th international conference on World Wide Web
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Graph evolution: Densification and shrinking diameters

ACM Transactions on Knowledge Discovery from Data (TKDD)
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
Optimizing web search using social annotations

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Simrank++: query rewriting through link analysis of the click graph

Proceedings of the VLDB Endowment
Accuracy estimate and optimization techniques for SimRank computation

Proceedings of the VLDB Endowment
On the evolution of user interaction in Facebook

Proceedings of the 2nd ACM workshop on Online social networks
Stochastic gradient boosted distributed decision trees

Proceedings of the 18th ACM conference on Information and knowledge management
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Fast computation of SimRank for static and dynamic information networks

Proceedings of the 13th International Conference on Extending Database Technology
Predicting positive and negative links in online social networks

Proceedings of the 19th international conference on World wide web
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A Space and Time Efficient Algorithm for SimRank Computation

APWEB '10 Proceedings of the 2010 12th International Asia-Pacific Web Conference
Parallel SimRank computation on large graphs with iterative aggregation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed SimRank algorithm, and leads to up to 30 times speed-up in the best case1.