Traps and Pitfalls of Topic-Biased PageRank

Authors:
Paolo Boldi;Roberto Posenato;Massimo Santini;Sebastiano Vigna
Affiliations:
Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Italy;Dipartimento di Informatica, Università degli Studi di Verona, Italy;Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Italy;Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Italy
Venue:
Algorithms and Models for the Web-Graph
Year:
2007

Citing 8
Cited 4

Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Extrapolation methods for accelerating PageRank computations

WWW '03 Proceedings of the 12th international conference on World Wide Web
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Searching the workplace web

WWW '03 Proceedings of the 12th international conference on World Wide Web
Ranking the web frontier

Proceedings of the 13th international conference on World Wide Web
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience

An Inner-Outer Iteration for Computing PageRank

SIAM Journal on Scientific Computing
Local computation of PageRank: the ranking side

Proceedings of the 20th ACM international conference on Information and knowledge management
Predicting participants in public events using stock photos

Proceedings of the 20th ACM international conference on Multimedia
A practical use of learning system using user preference in ubiquitous computing environment

Multimedia Tools and Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

We discuss a number of issues in the definition, computation and comparison of PageRank values that have been addressed sparsely in the literature, often with contradictory approaches. We study the difference between weaklyand stronglypreferential PageRank, which patch the dangling nodes with different distributions, extending analytical formulae known for the strongly preferential case, and corroborating our results with experiments on a snapshot of 100 millions of pages of the .ukdomain. The experiments show that the two PageRank versions are poorly correlated, and results about each one cannot be blindly applied to the other; moreover, our computations highlight some new concerns about the usage of exchange-based correlation indices (such as Kendall's 驴) on approximated rankings.