Query evaluation: strategies and optimizations
Information Processing and Management: an International Journal
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Computational advertising and recommender systems
Proceedings of the 2008 ACM conference on Recommender systems
Nearest-neighbor caching for content-match applications
Proceedings of the 18th international conference on World wide web
Personalized recommendation on dynamic content using predictive bilinear models
Proceedings of the 18th international conference on World wide web
Feature hashing for large scale multitask learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Early exit optimizations for additive machine learned ranking systems
Proceedings of the third ACM international conference on Web search and data mining
Performance of recommender algorithms on top-n recommendation tasks
Proceedings of the fourth ACM conference on Recommender systems
A candidate filtering mechanism for fast top-k query processing on modern cpus
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
CTR prediction for contextual advertising: learning-to-rank approach
Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Permutation indexing: fast approximate retrieval from large corpora
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
LASER: a scalable response prediction platform for online advertising
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
A crucial task in many recommender problems like computational advertising, content optimization, and others is to retrieve a small set of items by scoring a large item inventory through some elaborate statistical/machine-learned model. This is challenging since the retrieval has to be fast (few milliseconds) to load the page quickly. Fast retrieval is well studied in the information retrieval (IR) literature, especially in the context of document retrieval for queries. When queries and documents have sparse representation and relevance is measured through cosine similarity (or some variant thereof), one could build highly efficient retrieval algorithms that scale gracefully to increasing item inventory. The key components exploited by such algorithms is sparse query-document representation and the special form of the relevance function. Many machine-learned models used in modern recommender problems do not satisfy these properties and since brute force evaluation is not an option with large item inventory, heuristics that filter out some items are often employed to reduce model computations at runtime. In this paper, we take a two-stage approach where the first stage retrieves top-K items using our approximate procedures and the second stage selects the desired top-k using brute force model evaluation on the K retrieved items. The main idea of our approach is to reduce the first stage to a standard IR problem, where each item is represented by a sparse feature vector (a.k.a. the vector-space representation) and the query-item relevance score is given by vector dot product. The sparse item representation is learnt to closely approximate the original machine-learned score by using retrospective data. Such a reduction allows leveraging extensive work in IR that resulted in highly efficient retrieval systems. Our approach is model-agnostic, relying only on data generated from the machine-learned model. We obtain significant improvements in the computational cost vs. accuracy tradeoff compared to several baselines in our empirical evaluation on both synthetic models and on a click-through (CTR) model used in online advertising.