The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Evaluating strategies for similarity search on the web
Proceedings of the 11th international conference on World Wide Web
I/O-efficient techniques for computing pagerank
Proceedings of the eleventh international conference on Information and knowledge management
ANF: a fast and scalable tool for data mining in massive graphs
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Node similarity in networked information spaces
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Evaluating Top-k Queries over Web-Accessible Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Scaling link-based similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs
IEEE Transactions on Knowledge and Data Engineering
Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs
WAIM'10 Proceedings of the 11th international conference on Web-age information management
A space and time efficient algorithm for SimRank computation
World Wide Web
On the efficiency of estimating penetrating rank on large graphs
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
Several iterative hyperlink-based similarity measures were published to express the similarity of web pages However, it usually seems hopeless to evaluate complex similarity functions over large repositories containing hundreds of millions of pages.We introduce scalable algorithms computing SimRank scores, which express the contextual similarities of pages based on the hyperlink structure The proposed methods scale well to large repositories, fulfilling strict requirements about computational complexity The algorithms were tested on a set of ten million pages, but parallelization techniques make it possible to compute the SimRank scores even for the entire web with over 4 billion pages The key idea is that randomized Monte Carlo methods combined with indexing techniques yield a scalable approximation of SimRank.