Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling link-based similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Dual Labeling: Answering Graph Reachability Queries in Constant Time
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
The link-prediction problem for social networks
Journal of the American Society for Information Science and Technology
Fast and practical indexing and querying of very large graphs
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Optimization and evaluation of shortest path queries
The VLDB Journal — The International Journal on Very Large Data Bases
Vertex cover might be hard to approximate to within 2-ε
Journal of Computer and System Sciences
GADDI: distance index based subgraph matching in biological networks
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A Recommender System Based on Local Random Walks and Spectral Methods
Advances in Web Mining and Web Usage Analysis
Exploiting the Block Structure of Link Graph for Efficient Similarity Computation
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A better approximation ratio for the vertex cover problem
ACM Transactions on Algorithms (TALG)
Introduction to Algorithms, Third Edition
Introduction to Algorithms, Third Edition
Fast shortest path distance estimation in large networks
Proceedings of the 18th ACM conference on Information and knowledge management
Distance-join: pattern match query in a large graph database
Proceedings of the VLDB Endowment
Accuracy estimate and optimization techniques for SimRank computation
The VLDB Journal — The International Journal on Very Large Data Bases
Fast computation of SimRank for static and dynamic information networks
Proceedings of the 13th International Conference on Extending Database Technology
On graph query optimization in large networks
Proceedings of the VLDB Endowment
SAPPER: subgraph indexing and approximate matching in large graphs
Proceedings of the VLDB Endowment
Neighborhood based fast graph search in large networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Adding regular expressions to graph reachability and pattern queries
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality. Note that "SimRank" is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRank-based join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortest-path distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called h-go covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partition-based framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution.