Outlink estimation for pagerank computation under missing data

Authors:
Sreangsu Acharyya;Joydeep Ghosh
Affiliations:
University of Texas, Austin, TX;University of Texas, Austin, TX
Venue:
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Year:
2004

Citing 1
Cited 1

Using PageRank to Characterize Web Structure

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics

Using hyperlink features to personalize web search

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The enormity and rapid growth of the web-graph forces quantities such as its pagerank tobe computed under missing information consisting of outlinks of pages that have not yet been crawled. This paper examines the role played by the size and distribution of this missing data in determining the accuracy of the computed pagerank, focusing on questions such as (i) the accuracy of pageranks under missing information, (ii) the size at which a crawl process may be aborted while still ensuring reasonable accuracy of pageranks, and (iii) algorithms to estimate pageranks under such missing information. Thefirst couple of questions are addressed on the basis of certain simple bounds relating the expected distance between the true and computed pageranks and the size of the missing data. The third question is explored by devising algorithms to predict the pageranks when full information is not available. A key feature of the "dangling link estimation" and "clustered link estimation" algorithms proposed is that, they do not need to run the pagerank iteration afresh once the outlinks have been estimated.