STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
An accelerated gradient method for trace norm minimization
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
SIAM Journal on Imaging Sciences
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
SIAM Journal on Imaging Sciences
An architecture for parallel topic models
Proceedings of the VLDB Endowment
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to target: what works for behavioral targeting
Proceedings of the 20th ACM international conference on Information and knowledge management
Scalable inference in latent variable models
Proceedings of the fifth ACM international conference on Web search and data mining
Web-scale user modeling for targeting
Proceedings of the 21st international conference companion on World Wide Web
Decoding by linear programming
IEEE Transactions on Information Theory
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
A typical behavioral targeting system optimizing purchase activities, called conversions, faces two main challenges: the web-scale amounts of user histories to process on a daily basis, and the relative sparsity of conversions. In this paper, we try to address these challenges through feature selection. We formulate a multi-task (or group) feature-selection problem among a set of related tasks (sharing a common set of features), namely advertising campaigns. We apply a group-sparse penalty consisting of a combination of an l1 and l2 penalty and an associated fast optimization algorithm for distributed parameter estimation. Our algorithm relies on a variant of the well known Fast Iterative Thresholding Algorithm (FISTA), a closed-form solution for mixed norm programming and a distributed subgradient oracle. To efficiently handle web-scale user histories, we present a distributed inference algorithm for the problem that scales to billions of instances and millions of attributes. We show the superiority of our algorithm in terms of both sparsity and ROC performance over baseline feature selection methods (both single-task -regularization and multi-task mutual-information gain).