Web Page Rank Prediction with PCA and EM Clustering

Authors:
Polyxeni Zacharouli;Michalis Titsias;Michalis Vazirgiannis
Affiliations:
Univ. of Economics and Business, Athens, Greece;School of Computer Science, University of Manchester, UK;Univ. of Economics and Business, Athens, Greece
Venue:
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Year:
2009

Citing 12
Cited 0

Mixtures of probabilistic principal component analyzers

Neural Computation
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Updating pagerank with iterative aggregation

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Local methods for estimating pagerank values

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Predictive ranking: a novel page ranking approach by estimating the web structure

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Fast webpage classification using URL features

Proceedings of the 14th ACM international conference on Information and knowledge management
Efficient PageRank approximation via graph aggregation

Information Retrieval
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Web page rank prediction with markov models

Proceedings of the 17th international conference on World Wide Web
Representing and Quantifying Rank - Change for the Web Graph

Algorithms and Models for the Web-Graph

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA). These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines.