A scalable randomized method to compute link-based similarity rank on the web graph

Authors:
Dániel Fogaras;Balázs Rácz
Affiliations:
Computer and Automation Research Institute of the Hungarian Academy of Sciences;Computer and Automation Research Institute of the Hungarian Academy of Sciences
Venue:
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Year:
2004

Citing 13
Cited 5

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Evaluating strategies for similarity search on the web

Proceedings of the 11th international conference on World Wide Web
I/O-efficient techniques for computing pagerank

Proceedings of the eleventh international conference on Information and knowledge management
Self-Organization and Identification of Web Communities

Computer
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Node similarity in networked information spaces

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures

Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

IEEE Transactions on Knowledge and Data Engineering
Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs

WAIM'10 Proceedings of the 11th international conference on Web-age information management
A space and time efficient algorithm for SimRank computation

World Wide Web
On the efficiency of estimating penetrating rank on large graphs

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several iterative hyperlink-based similarity measures were published to express the similarity of web pages However, it usually seems hopeless to evaluate complex similarity functions over large repositories containing hundreds of millions of pages.We introduce scalable algorithms computing SimRank scores, which express the contextual similarities of pages based on the hyperlink structure The proposed methods scale well to large repositories, fulfilling strict requirements about computational complexity The algorithms were tested on a set of ten million pages, but parallelization techniques make it possible to compute the SimRank scores even for the entire web with over 4 billion pages The key idea is that randomized Monte Carlo methods combined with indexing techniques yield a scalable approximation of SimRank.