Dr. Searcher and Mr. Browser: a unified hyperlink-click graph

Authors:
Barbara Poblete;Carlos Castillo;Aristides Gionis
Affiliations:
University Pompeu Fabra, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 11
Cited 9

Randomized algorithms

Randomized algorithms
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Link fusion: a unified link analysis framework for multi-type interrelated data objects

Proceedings of the 13th international conference on World Wide Web
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Spam: It's Not Just for Inboxes Anymore

Computer
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining

A generalized Co-HITS algorithm and its application to bipartite graphs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Entropy-biased models for query representation on the click graph

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Visual-semantic graphs: using queries to reduce the semantic gap in web image retrieval

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Graph structures and algorithms for query-log analysis

CiE'10 Proceedings of the Programs, proofs, process and 6th international conference on Computability in Europe
Page importance computation based on Markov processes

Information Retrieval
Survey on web spam detection: principles and algorithms

ACM SIGKDD Explorations Newsletter
Employing document dependency in blog search

Journal of the American Society for Information Science and Technology
Measuring website similarity using an entity-aware click graph

Proceedings of the 21st ACM international conference on Information and knowledge management
Intent-Based browse activity segmentation

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink and click graphs. The hyperlink graph expresses link structure among Web pages, while the click graph is a bipartite graph of queries and documents denoting users' searching behavior extracted from a search engine's query log. Our most important motivation is to model in a unified way the two main activities of users on the Web: searching and browsing, and at the same time to analyze the effects of random walks on this new graph. The intuition behind this task is to measure how the combination of link structure and usage data provide additional information to that contained in these structures independently. Our experimental results show that both hyperlink and click graphs have strengths and weaknesses when it comes to using their stationary distribution scores for ranking Web pages. Furthermore, our evaluation indicates that the unified graph always generates consistent and robust scores that follow closely the best result obtained from either individual graph, even when applied to "noisy" data. It is our belief that the unified Web graph has several useful properties for improving current Web document ranking, as well as for generating new rankings of its own. In particular stationary distribution scores derived from the random walks on the combined graph can be used as an indicator of whether structural or usage data are more reliable in different situations.