Accuracy estimate and optimization techniques for SimRank computation

Authors:
Dmitry Lizorkin;Pavel Velikhov;Maxim Grinev;Denis Turdakov
Affiliations:
Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia 109004;Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia 109004;Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia 109004;Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia 109004
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2010

Citing 21
Cited 9

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Structure and Interpretation of Computer Programs

Structure and Interpretation of Computer Programs
Exploiting hierarchical domain structure to compute similarity

ACM Transactions on Information Systems (TOIS)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Node similarity in networked information spaces

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Parallel PageRank Computation on a Gigabit PC Cluster

AINA '04 Proceedings of the 18th International Conference on Advanced Information Networking and Applications - Volume 2
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
SimFusion: measuring similarity using unified relationship matrix

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Local Graph Partitioning using PageRank Vectors

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Algorithmic Computation and Approximation of Semantic Similarity

World Wide Web
PageSim: A Novel Link-Based Similarity Measure for the World Wide Web

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Relational link-based ranking

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Simrank++: query rewriting through link analysis of the click graph

Proceedings of the VLDB Endowment
Accuracy estimate and optimization techniques for SimRank computation

Proceedings of the VLDB Endowment
The Mailman algorithm: A note on matrix--vector multiplication

Information Processing Letters
Analysis of community structure in Wikipedia

Proceedings of the 18th international conference on World wide web
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Efficient parallel computation of pagerank

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Efficient link-based clustering in a large scaled blog network

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Pairwise similarity calculation of information networks

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
ASAP: towards accurate, stable and accelerative penetrating-rank estimation on large graphs

WAIM'11 Proceedings of the 12th international conference on Web-age information management
An up-to-date knowledge-based literature search and exploration framework for focused bioscience domains

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
A space and time efficient algorithm for SimRank computation

World Wide Web
Ranking structural parameters for social networks

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
SimFusion+: extending simfusion towards efficient estimation on large and dynamic networks

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On the efficiency of estimating penetrating rank on large graphs

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Efficient simrank-based similarity join over large graphs

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. SimRank is a simple and intuitive measure of this kind, based on a graph-theoretic model. SimRank is typically computed iteratively, in the spirit of PageRank. However, existing work on SimRank lacks accuracy estimation of iterative computation and has discouraging time complexity. In this paper, we present a technique to estimate the accuracy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the computational complexity of the iterative algorithm from O(n 4) in the worst case to min(O(nl), O(n 3/ log2 n)), with n denoting the number of objects, and l denoting the number object-to-object relationships. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method. As a practical illustration of our techniques, we computed SimRank scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links.