Randomized algorithms
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Finding authorities and hubs from link structures on the World Wide Web
Proceedings of the 10th international conference on World Wide Web
Stable algorithms for link analysis
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Web page scoring systems for horizontal and vertical search
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Improvement of HITS-based algorithms on web documents
Proceedings of the 11th international conference on World Wide Web
Two-stage language models for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to Probabilistically Identify Authoritative Documents
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Extrapolation methods for accelerating PageRank computations
WWW '03 Proceedings of the 12th international conference on World Wide Web
Scaling personalized web search
WWW '03 Proceedings of the 12th international conference on World Wide Web
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
ACM Transactions on Internet Technology (TOIT)
Link analysis ranking: algorithms, theory, and experiments
ACM Transactions on Internet Technology (TOIT)
PageRank as a function of the damping factor
WWW '05 Proceedings of the 14th international conference on World Wide Web
A uniform approach to accelerated PageRank computation
WWW '05 Proceedings of the 14th international conference on World Wide Web
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Exploiting the hierarchical structure for link analysis
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Topical TrustRank: using topicality to combat web spam
Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Generalizing PageRank: damping functions for link-based ranking algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Journal of Computational and Applied Mathematics
Expertise ranking using activity and contextual link measures
Data & Knowledge Engineering
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
Link-based ranking algorithms are among the most important techniques to improve web search. In particular, the PageRank algorithm has been successfully used in the Google search engine and has been attracting much attention recently. However, we find that PageRank has a “zero-one gap” problem which, to the best of our knowledge, has not been addressed in any previous work. This problem can be potentially exploited to spam PageRank results and make the state-of-the-art link-based antispamming techniques ineffective. The zero-one gap problem arises as a result of the current ad hoc way of computing transition probabilities in the random surfing model. We therefore propose a novel DirichletRank algorithm which calculates these probabilities using Bayesian estimation with a Dirichlet prior. DirichletRank is a variant of PageRank, but does not have the problem of zero-one gap and can be analytically shown substantially more resistant to some link spams than PageRank. Experiment results on TREC data show that DirichletRank can achieve better retrieval accuracy than PageRank due to its more reasonable allocation of transition probabilities. More importantly, experiments on the TREC dataset and another real web dataset from the Webgraph project show that, compared with the original PageRank, DirichletRank is more stable under link perturbation and is significantly more robust against both manually identified web spams and several simulated link spams. DirichletRank can be computed as efficiently as PageRank, and thus is scalable to large-scale web applications.