Analysis of a very large web search engine query log
ACM SIGIR Forum
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Expectation Propagation for approximate Bayesian inference
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
Optimizing web search using web click-through data
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
ACM SIGMOD Record
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Mining search engine query logs for query recommendation
Proceedings of the 15th international conference on World Wide Web
Learning user interaction models for predicting web search result preferences
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A large-scale analysis of query logs for assessing personalization opportunities
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
ACM Transactions on Information Systems (TOIS)
Predicting clicks: estimating the click-through rate for new ads
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
FRank: a ranking method with fidelity loss
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An experimental comparison of click position-bias models
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Mining the search trails of surfing crowds: identifying relevant websites from user activity
Proceedings of the 17th international conference on World Wide Web
A user browsing model to predict search engine click data from past observations.
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient multiple-click models in web search
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Graphical Models, Exponential Families, and Variational Inference
Foundations and Trends® in Machine Learning
A Cascade Model for Externalities in Sponsored Search
WINE '08 Proceedings of the 4th International Workshop on Internet and Network Economics
Sponsored Search Auctions with Markovian Users
WINE '08 Proceedings of the 4th International Workshop on Internet and Network Economics
Tailoring click models to user goals
Proceedings of the 2009 workshop on Web Search Click Data
A dynamic bayesian network click model for web search ranking
Proceedings of the 18th international conference on World wide web
Click chain model in web search
Proceedings of the 18th international conference on World wide web
PSkip: estimating relevance ranking quality from web search clickthrough data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Smoothing clickthrough data for web search ranking
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning to Rank for Information Retrieval
Foundations and Trends in Information Retrieval
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Post-rank reordering: resolving preference misalignments between search engines and end users
Proceedings of the 18th ACM conference on Information and knowledge management
A novel click model and its applications to online advertising
Proceedings of the third ACM international conference on Web search and data mining
Personalized click prediction in sponsored search
Proceedings of the third ACM international conference on Web search and data mining
Investigating the effectiveness of clickthrough data for document reordering
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Applications of web query mining
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Constructing free-energy approximations and generalized belief propagation algorithms
IEEE Transactions on Information Theory
Exploiting contextual factors for click modeling in sponsored search
Proceedings of the 7th ACM international conference on Web search and data mining
Estimating ad group performance in sponsored search
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
A fundamental challenge in utilizing Web search click data is to infer user-perceived relevance from the search log. Not only is the inference a difficult problem involving statistical reasonings but the bulky size, together with the ever-increasing nature, of the log data imposes extra requirements on scalability. In this paper, we propose the Bayesian Browsing Model (BBM), which performs exact inference of the document relevance, only requires a single pass of the data (i.e., the optimal scalability), and is shown effective. We present two sets of experiments to evaluate the model effectiveness and scalability. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM outperforms the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click log set, spanning a quarter of petabyte, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.