Using PageRank to Characterize Web Structure
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Using hyperlink features to personalize web search
WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Hi-index | 0.00 |
The enormity and rapid growth of the web-graph forces quantities such as its pagerank tobe computed under missing information consisting of outlinks of pages that have not yet been crawled. This paper examines the role played by the size and distribution of this missing data in determining the accuracy of the computed pagerank, focusing on questions such as (i) the accuracy of pageranks under missing information, (ii) the size at which a crawl process may be aborted while still ensuring reasonable accuracy of pageranks, and (iii) algorithms to estimate pageranks under such missing information. Thefirst couple of questions are addressed on the basis of certain simple bounds relating the expected distance between the true and computed pageranks and the size of the missing data. The third question is explored by devising algorithms to predict the pageranks when full information is not available. A key feature of the "dangling link estimation" and "clustered link estimation" algorithms proposed is that, they do not need to run the pagerank iteration afresh once the outlinks have been estimated.