Fast incremental proximity search in large graphs

Authors:
Purnamrita Sarkar;Andrew W. Moore;Amit Prakash
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Google Inc., Pittsburgh, PA;Google Inc., Pittsburgh, PA
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 7
Cited 18

Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The link prediction problem for social networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Dynamic personalized pagerank in entity-relation graphs

Proceedings of the 16th international conference on World Wide Web
Fast direction-aware proximity for graph mining

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Graph sparsification by effective resistances

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing

Fast dynamic reranking in large graphs

Proceedings of the 18th international conference on World wide web
Scalable proximity estimation and link prediction in online social networks

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Hitting the right paraphrases in good time

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
k-nearest neighbors in uncertain graphs

Proceedings of the VLDB Endowment
Semi-supervised classification and betweenness computation on large, sparse, directed graphs

Pattern Recognition
Assessing and ranking structural correlations in graphs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Index design and query processing for graph conductance search

The VLDB Journal — The International Journal on Very Large Data Bases
Using a Wikipedia-based semantic relatedness measure for document clustering

TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
A novel metric for information retrieval in semantic networks

ESWC'11 Proceedings of the 8th international conference on The Semantic Web
Clustered embedding of massive social networks

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Density index and proximity search in large graphs

Proceedings of the 21st ACM international conference on Information and knowledge management
Evaluating geo-social influence in location-based social networks

Proceedings of the 21st ACM international conference on Information and knowledge management
Impact neighborhood indexing (INI) in diffusion graphs

Proceedings of the 21st ACM international conference on Information and knowledge management
Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Artificial Intelligence
Commute times of random walks on trees

Discrete Applied Mathematics
IRWR: incremental random walk with restart

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Accurate and scalable nearest neighbors in large networks based on effective importance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate two aspects of ranking problems on large graphs. First, we augment the deterministic pruning algorithm in Sarkar and Moore (2007) with sampling techniques to compute approximately correct rankings with high probability under random walk based proximity measures at query time. Second, we prove some surprising locality properties of these proximity measures by examining the short term behavior of random walks. The proposed algorithm can answer queries on the fly without caching any information about the entire graph. We present empirical results on a 600, 000 node author-word-citation graph from the Citeseer domain on a single CPU machine where the average query processing time is around 4 seconds. We present quantifiable link prediction tasks. On most of them our techniques outperform Personalized Pagerank, a well-known diffusion based proximity measure.