Bayesian Browsing Model: Exact Inference of Document Relevance from Petabyte-Scale Data

Authors:
Chao Liu;Fan Guo;Christos Faloutsos
Affiliations:
Microsoft Research;Carnegie Mellon University;Carnegie Mellon University
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2010

Citing 41
Cited 2

Analysis of a very large web search engine query log

ACM SIGIR Forum
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Expectation Propagation for approximate Bayesian inference

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Optimizing web search using web click-through data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Accurately interpreting clickthrough data as implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining data streams: a review

ACM SIGMOD Record
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Mining search engine query logs for query recommendation

Proceedings of the 15th international conference on World Wide Web
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A large-scale analysis of query logs for assessing personalization opportunities

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search

ACM Transactions on Information Systems (TOIS)
Predicting clicks: estimating the click-through rate for new ads

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
FRank: a ranking method with fidelity loss

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Mining the search trails of surfing crowds: identifying relevant websites from user activity

Proceedings of the 17th international conference on World Wide Web
A user browsing model to predict search engine click data from past observations.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient multiple-click models in web search

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Graphical Models, Exponential Families, and Variational Inference

Foundations and Trends® in Machine Learning
A Cascade Model for Externalities in Sponsored Search

WINE '08 Proceedings of the 4th International Workshop on Internet and Network Economics
Sponsored Search Auctions with Markovian Users

WINE '08 Proceedings of the 4th International Workshop on Internet and Network Economics
Tailoring click models to user goals

Proceedings of the 2009 workshop on Web Search Click Data
A dynamic bayesian network click model for web search ranking

Proceedings of the 18th international conference on World wide web
Click chain model in web search

Proceedings of the 18th international conference on World wide web
PSkip: estimating relevance ranking quality from web search clickthrough data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Smoothing clickthrough data for web search ranking

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Post-rank reordering: resolving preference misalignments between search engines and end users

Proceedings of the 18th ACM conference on Information and knowledge management
A novel click model and its applications to online advertising

Proceedings of the third ACM international conference on Web search and data mining
Personalized click prediction in sponsored search

Proceedings of the third ACM international conference on Web search and data mining
Investigating the effectiveness of clickthrough data for document reordering

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Applications of web query mining

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Constructing free-energy approximations and generalized belief propagation algorithms

IEEE Transactions on Information Theory

Exploiting contextual factors for click modeling in sponsored search

Proceedings of the 7th ACM international conference on Web search and data mining
Estimating ad group performance in sponsored search

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fundamental challenge in utilizing Web search click data is to infer user-perceived relevance from the search log. Not only is the inference a difficult problem involving statistical reasonings but the bulky size, together with the ever-increasing nature, of the log data imposes extra requirements on scalability. In this paper, we propose the Bayesian Browsing Model (BBM), which performs exact inference of the document relevance, only requires a single pass of the data (i.e., the optimal scalability), and is shown effective. We present two sets of experiments to evaluate the model effectiveness and scalability. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM outperforms the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click log set, spanning a quarter of petabyte, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.