Web-scale multi-task feature selection for behavioral targeting

Authors:
Amr Ahmed;Mohamed Aly;Abhimanyu Das;Alexander J. Smola;Tasos Anastasakos
Affiliations:
Google Research, Mountain View, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Google Research, Mountain View, CA, USA;Yahoo! Research, Santa Clara, CA, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 10
Cited 1

Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
An accelerated gradient method for trace norm minimization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

SIAM Journal on Imaging Sciences
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

SIAM Journal on Imaging Sciences
An architecture for parallel topic models

Proceedings of the VLDB Endowment
Scalable distributed inference of dynamic user interests for behavioral targeting

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to target: what works for behavioral targeting

Proceedings of the 20th ACM international conference on Information and knowledge management
Scalable inference in latent variable models

Proceedings of the fifth ACM international conference on Web search and data mining
Web-scale user modeling for targeting

Proceedings of the 21st international conference companion on World Wide Web
Decoding by linear programming

IEEE Transactions on Information Theory

Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A typical behavioral targeting system optimizing purchase activities, called conversions, faces two main challenges: the web-scale amounts of user histories to process on a daily basis, and the relative sparsity of conversions. In this paper, we try to address these challenges through feature selection. We formulate a multi-task (or group) feature-selection problem among a set of related tasks (sharing a common set of features), namely advertising campaigns. We apply a group-sparse penalty consisting of a combination of an l1 and l2 penalty and an associated fast optimization algorithm for distributed parameter estimation. Our algorithm relies on a variant of the well known Fast Iterative Thresholding Algorithm (FISTA), a closed-form solution for mixed norm programming and a distributed subgradient oracle. To efficiently handle web-scale user histories, we present a distributed inference algorithm for the problem that scales to billions of instances and millions of attributes. We show the superiority of our algorithm in terms of both sparsity and ROC performance over baseline feature selection methods (both single-task -regularization and multi-task mutual-information gain).