Efficient simrank-based similarity join over large graphs

Authors:
Weiguo Zheng;Lei Zou;Yansong Feng;Lei Chen;Dongyan Zhao
Affiliations:
Peking University, China;Peking University, China;Peking University, China;Hong Kong University of Science and Technology, China;Peking University, China
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 24
Cited 0

Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Dual Labeling: Answering Graph Reachability Queries in Constant Time

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
Fast and practical indexing and querying of very large graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Optimization and evaluation of shortest path queries

The VLDB Journal — The International Journal on Very Large Data Bases
Vertex cover might be hard to approximate to within 2-ε

Journal of Computer and System Sciences
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A Recommender System Based on Local Random Walks and Spectral Methods

Advances in Web Mining and Web Usage Analysis
Exploiting the Block Structure of Link Graph for Efficient Similarity Computation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Fast Graph Pattern Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A better approximation ratio for the vertex cover problem

ACM Transactions on Algorithms (TALG)
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Fast shortest path distance estimation in large networks

Proceedings of the 18th ACM conference on Information and knowledge management
Distance-join: pattern match query in a large graph database

Proceedings of the VLDB Endowment
Accuracy estimate and optimization techniques for SimRank computation

The VLDB Journal — The International Journal on Very Large Data Bases
Fast computation of SimRank for static and dynamic information networks

Proceedings of the 13th International Conference on Extending Database Technology
On graph query optimization in large networks

Proceedings of the VLDB Endowment
SAPPER: subgraph indexing and approximate matching in large graphs

Proceedings of the VLDB Endowment
Neighborhood based fast graph search in large networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Adding regular expressions to graph reachability and pattern queries

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality. Note that "SimRank" is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRank-based join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortest-path distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called h-go covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partition-based framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution.