Accuracy estimate and optimization techniques for SimRank computation

Authors:
Dmitry Lizorkin;Pavel Velikhov;Maxim Grinev;Denis Turdakov
Affiliations:
Institute for System Programming of the Russian Academy of Sciences;Institute for System Programming of the Russian Academy of Sciences;Institute for System Programming of the Russian Academy of Sciences;Institute for System Programming of the Russian Academy of Sciences
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 13
Cited 19

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Exploiting hierarchical domain structure to compute similarity

ACM Transactions on Information Systems (TOIS)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Node similarity in networked information spaces

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
SimFusion: measuring similarity using unified relationship matrix

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Algorithmic Computation and Approximation of Semantic Similarity

World Wide Web
PageSim: A Novel Link-Based Similarity Measure for the World Wide Web

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Relational link-based ranking

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

P-Rank: a comprehensive structural similarity measure over information networks

Proceedings of the 18th ACM conference on Information and knowledge management
Accuracy estimate and optimization techniques for SimRank computation

The VLDB Journal — The International Journal on Very Large Data Bases
Fast computation of SimRank for static and dynamic information networks

Proceedings of the 13th International Conference on Extending Database Technology
Enhancing link-based similarity through the use of non-numerical labels and prior information

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Parallel SimRank computation on large graphs with iterative aggregation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Word sense disambiguation methods

Programming and Computing Software
A fast two-stage algorithm for computing SimRank and its extensions

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
A linguistically grounded graph model for bilingual lexicon extraction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Finding the bias and prestige of nodes in networks based on trust scores

Proceedings of the 20th international conference on World wide web
Finding information nebula over large networks

Proceedings of the 20th ACM international conference on Information and knowledge management
Fast and exact top-k search for random walk with restart

Proceedings of the VLDB Endowment
A space and time efficient algorithm for SimRank computation

World Wide Web
Relevance search in heterogeneous networks

Proceedings of the 15th International Conference on Extending Database Technology
SympGraph: a framework for mining clinical notes through symptom relation graphs

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Delta-SimRank computing on MapReduce

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Scalable and axiomatic ranking of network role similarity

ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue
Assessing single-pair similarity over graphs by aggregating first-meeting probabilities

Information Systems
Structure/attribute computation of similarities between nodes of a RDF graph with application to linked data clustering

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. SimRank is a simple and intuitive measure of this kind, based on graph-theoretic model. SimRank is typically computed iteratively, in the spirit of PageRank. However, existing work on SimRank lacks accuracy estimation of iterative computation and has discouraging time complexity. In this paper we present a technique to estimate the accuracy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the computational complexity of the iterative algorithm from O(n4) to O(n3) in the worst case. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method. As a practical illustration of our techniques we computed SimRank scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links.