Page importance computation based on Markov processes

Authors:
Bin Gao;Tie-Yan Liu;Yuting Liu;Taifeng Wang;Zhi-Ming Ma;Hang Li
Affiliations:
Microsoft Research Asia, Sigma Center, Haidian District, Beijing, People's Republic of China 100190;Microsoft Research Asia, Sigma Center, Haidian District, Beijing, People's Republic of China 100190;Beijing Jiaotong University, Haidian District, Beijing, People's Republic of China 100044;Microsoft Research Asia, Sigma Center, Haidian District, Beijing, People's Republic of China 100190;Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Haidian District, Beijing, People's Republic of China 100190;Microsoft Research Asia, Sigma Center, Haidian District, Beijing, People's Republic of China 100190
Venue:
Information Retrieval
Year:
2011

Citing 15
Cited 2

Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
A Unified Framework for Web Link Analysis

WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Inside PageRank

ACM Transactions on Internet Technology (TOIT)
PageRank as a function of the damping factor

WWW '05 Proceedings of the 14th international conference on World Wide Web
Object-level ranking: bringing order to Web objects

WWW '05 Proceedings of the 14th international conference on World Wide Web
A uniform approach to accelerated PageRank computation

WWW '05 Proceedings of the 14th international conference on World Wide Web
Adding the Temporal Dimension to Search " A Case Study in Publication Search

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
BrowseRank: letting web users vote for page importance

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Dr. Searcher and Mr. Browser: a unified hyperlink-click graph

Proceedings of the 17th ACM conference on Information and knowledge management
A general markov framework for page importance computation

Proceedings of the 18th ACM conference on Information and knowledge management

Image ranking based on user browsing behavior

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Fresh BrowseRank

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper is concerned with Markov processes for computing page importance. Page importance is a key factor in Web search. Many algorithms such as PageRank and its variations have been proposed for computing the quantity in different scenarios, using different data sources, and with different assumptions. Then a question arises, as to whether these algorithms can be explained in a unified way, and whether there is a general guideline to design new algorithms for new scenarios. In order to answer these questions, we introduce a General Markov Framework in this paper. Under the framework, a Web Markov Skeleton Process is used to model the random walk conducted by the web surfer on a given graph. Page importance is then defined as the product of two factors: page reachability, the average possibility that the surfer arrives at the page, and page utility, the average value that the page gives to the surfer in a single visit. These two factors can be computed as the stationary probability distribution of the corresponding embedded Markov chain and the mean staying time on each page of the Web Markov Skeleton Process respectively. We show that this general framework can cover many existing algorithms including PageRank, TrustRank, and BrowseRank as its special cases. We also show that the framework can help us design new algorithms to handle more complex problems, by constructing graphs from new data sources, employing new family members of the Web Markov Skeleton Process, and using new methods to estimate these two factors. In particular, we demonstrate the use of the framework with the exploitation of a new process, named Mirror Semi-Markov Process. In the new process, the staying time on a page, as a random variable, is assumed to be dependent on both the current page and its inlink pages. Our experimental results on both the user browsing graph and the mobile web graph validate that the Mirror Semi-Markov Process is more effective than previous models in several tasks, even when there are web spams and when the assumption on preferential attachment does not hold.