Fast top-k retrieval for model based recommendation

Authors:
Deepak Agarwal;Maxim Gurevich
Affiliations:
Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA
Venue:
Proceedings of the fifth ACM international conference on Web search and data mining
Year:
2012

Citing 11
Cited 4

Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Computational advertising and recommender systems

Proceedings of the 2008 ACM conference on Recommender systems
Nearest-neighbor caching for content-match applications

Proceedings of the 18th international conference on World wide web
Personalized recommendation on dynamic content using predictive bilinear models

Proceedings of the 18th international conference on World wide web
Feature hashing for large scale multitask learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Early exit optimizations for additive machine learned ranking systems

Proceedings of the third ACM international conference on Web search and data mining
Performance of recommender algorithms on top-n recommendation tasks

Proceedings of the fourth ACM conference on Recommender systems

A candidate filtering mechanism for fast top-k query processing on modern cpus

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
CTR prediction for contextual advertising: learning-to-rank approach

Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Permutation indexing: fast approximate retrieval from large corpora

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
LASER: a scalable response prediction platform for online advertising

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A crucial task in many recommender problems like computational advertising, content optimization, and others is to retrieve a small set of items by scoring a large item inventory through some elaborate statistical/machine-learned model. This is challenging since the retrieval has to be fast (few milliseconds) to load the page quickly. Fast retrieval is well studied in the information retrieval (IR) literature, especially in the context of document retrieval for queries. When queries and documents have sparse representation and relevance is measured through cosine similarity (or some variant thereof), one could build highly efficient retrieval algorithms that scale gracefully to increasing item inventory. The key components exploited by such algorithms is sparse query-document representation and the special form of the relevance function. Many machine-learned models used in modern recommender problems do not satisfy these properties and since brute force evaluation is not an option with large item inventory, heuristics that filter out some items are often employed to reduce model computations at runtime. In this paper, we take a two-stage approach where the first stage retrieves top-K items using our approximate procedures and the second stage selects the desired top-k using brute force model evaluation on the K retrieved items. The main idea of our approach is to reduce the first stage to a standard IR problem, where each item is represented by a sparse feature vector (a.k.a. the vector-space representation) and the query-item relevance score is given by vector dot product. The sparse item representation is learnt to closely approximate the original machine-learned score by using retrospective data. Such a reduction allows leveraging extensive work in IR that resulted in highly efficient retrieval systems. Our approach is model-agnostic, relying only on data generated from the machine-learned model. We obtain significant improvements in the computational cost vs. accuracy tradeoff compared to several baselines in our empirical evaluation on both synthetic models and on a click-through (CTR) model used in online advertising.