I/O-efficient techniques for computing pagerank

Authors:
Yen-Yu Chen;Qingqing Gan;Torsten Suel
Affiliations:
Polytechnic University, Brooklyn, NY;Polytechnic University, Brooklyn, NY;Polytechnic University, Brooklyn, NY
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 30
Cited 16

Iterative solution methods

Iterative solution methods
Silk from a sow's ear: extracting usable structures from the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The quest for correct information on the Web: hyper search engines

Selected papers from the sixth international conference on World Wide Web
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The connectivity server: fast access to linkage information on the Web

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
External-memory graph algorithms

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Fast approximation algorithm for minimum cost multicommodity flow

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The stochastic approach for link-structure analysis (SALSA) and the TKC effect

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An efficient algorithm to rank Web resources

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Mining the Web's Link Structure

Computer
Self-Organization and Identification of Web Communities

Computer
Extracting Large-Scale Knowledge Bases from the Web

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Multicommodity Flow and Circuit Switching

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
The Link Database: Fast Access to Graphs of the Web

DCC '02 Proceedings of the Data Compression Conference
Towards Compressing Web Graphs

DCC '01 Proceedings of the Data Compression Conference
Compressing the Graph Structure of the Web

DCC '01 Proceedings of the Data Compression Conference
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Ranking the web frontier

Proceedings of the 13th international conference on World Wide Web
Local methods for estimating pagerank values

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Efficient PageRank approximation via graph aggregation

Information Retrieval
Divide and conquer approach for efficient pagerank computation

ICWE '06 Proceedings of the 6th international conference on Web engineering
Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

IEEE Transactions on Knowledge and Data Engineering
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Authority-based keyword search in databases

ACM Transactions on Database Systems (TODS)
An Efficient Algorithm and Its Parallelization for Computing PageRank

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Optimizing web structures using web mining techniques

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
A scalable randomized method to compute link-based similarity rank on the web graph

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
A mixed MPI-Thread approach for parallel page ranking computation

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
An update-aware storage system for low-locality update-intensive workloads

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Parallelization of pagerank on multicore processors

ICDCIT'12 Proceedings of the 8th international conference on Distributed Computing and Internet Technology
GraphChi: large-scale graph computation on just a PC

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Privacy preserving release of blogosphere data in the presence of search engines

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the last few years, most major search engines have integrated link-based ranking techniques in order to provide more accurate search results. One widely known approach is the Pagerank technique, which forms the basis of the Google ranking scheme, and which assigns a global importance measure to each page based on the importance of other pages pointing to it. The main advantage of the Pagerank measure is that it is independent of the query posed by a user; this means that it can be precomputed and then used to optimize the layout of the inverted index structure accordingly. However, computing the Pagerank measure requires implementing an iterative process on a massive graph corresponding to billions of web pages and hyperlinks.In this paper, we study I/O-efficient techniques to perform this iterative computation. We derive two algorithms for Pagerank based on techniques proposed for out-of-core graph algorithms, and compare them to two existing algorithms proposed by Haveliwala. We also consider the implementation of a recently proposed topic-sensitive version of Pagerank. Our experimental results show that for very large data sets, significant improvements over previous results can be achieved on machines with moderate amounts of memory. On the other hand, at most minor improvements are possible on data sets that are only moderately larger than memory, which is the case in many practical scenarios.