A Comparison of Several Bandwidth and Profile Reduction Algorithms
ACM Transactions on Mathematical Software (TOMS)
Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Journal of the ACM (JACM)
GaP: a factor model for discrete data
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Practical lessons of data mining at Yahoo!
Proceedings of the 18th ACM conference on Information and knowledge management
fLDA: matrix factorization through latent dirichlet allocation
Proceedings of the third ACM international conference on Web search and data mining
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce
Proceedings of the 19th international conference on World wide web
Transfer learning for behavioral targeting
Proceedings of the 19th international conference on World wide web
Extracting user profiles from large scale data
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Fast online learning through offline initialization for time-sensitive recommendation
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Behavioral Targeting: The Art of Scaling Up Simple Algorithms
ACM Transactions on Knowledge Discovery from Data (TKDD)
Predicting product adoption in large-scale social networks
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Topic modeling for personalized recommendation of volatile items
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Virtualizing high performance computing
ACM SIGOPS Operating Systems Review
High-performance dynamic pattern matching over disordered streams
Proceedings of the VLDB Endowment
Like like alike: joint friendship and interest propagation in social networks
Proceedings of the 20th international conference on World wide web
Collaborative competitive filtering: learning recommender using context of user choice
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Recommending ephemeral items at web scale
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive client-side profiles for personalized advertising
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Real-time bidding algorithms for performance-based display ad allocation
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Temporal multi-hierarchy smoothing for estimating rates of rare events
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to advertise: how many ads are enough?
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Retrieval models for audience selection in display advertising
Proceedings of the 20th ACM international conference on Information and knowledge management
Learning to rank audience for behavioral targeting in display ads
Proceedings of the 20th ACM international conference on Information and knowledge management
Learning to target: what works for behavioral targeting
Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale behavioral targeting with a social twist
Proceedings of the 20th ACM international conference on Information and knowledge management
Finding the right consumer: optimizing for conversion in display advertising campaigns
Proceedings of the fifth ACM international conference on Web search and data mining
DVM: towards a datacenter-scale virtual machine
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Targeting converters for new campaigns through factor models
Proceedings of the 21st international conference on World Wide Web
How effective is targeted advertising?
Proceedings of the 21st international conference on World Wide Web
Web-scale user modeling for targeting
Proceedings of the 21st international conference companion on World Wide Web
Factoring past exposure in display advertising targeting
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Friend or frenemy?: predicting signed ties in social networks
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Enabling direct interest-aware audience selection
Proceedings of the 21st ACM international conference on Information and knowledge management
Climbing the app wall: enabling mobile app discovery through context-aware recommendations
Proceedings of the 21st ACM international conference on Information and knowledge management
An economic analysis of user-privacy options in ad-supported services
WINE'12 Proceedings of the 8th international conference on Internet and Network Economics
Intuitive Topic Discovery by Incorporating Word-Pair's Connection Into LDA
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Mobile advertising: evaluating the effects of animation, user and content relevance
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Music similarity and retrieval
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Towards a robust modeling of temporal interest change patterns for behavioral targeting
Proceedings of the 22nd international conference on World Wide Web
A probabilistic graphical model for brand reputation assessment in social networks
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Best paper -- Follow the money: understanding economics of online aggregation and advertising
Proceedings of the 2013 conference on Internet measurement conference
From devices to people: attribution of search activity in multi-user settings
Proceedings of the 23rd international conference on World wide web
A statistical approach to mining customers' conversational data from social media
IBM Journal of Research and Development
Hi-index | 0.00 |
Behavioral targeting (BT) leverages historical user behavior to select the ads most relevant to users to display. The state-of-the-art of BT derives a linear Poisson regression model from fine-grained user behavioral data and predicts click-through rate (CTR) from user history. We designed and implemented a highly scalable and efficient solution to BT using Hadoop MapReduce framework. With our parallel algorithm and the resulting system, we can build above 450 BT-category models from the entire Yahoo's user base within one day, the scale that one can not even imagine with prior systems. Moreover, our approach has yielded 20% CTR lift over the existing production system by leveraging the well-grounded probabilistic model fitted from a much larger training dataset. Specifically, our major contributions include: (1) A MapReduce statistical learning algorithm and implementation that achieve optimal data parallelism, task parallelism, and load balance in spite of the typically skewed distribution of domain data. (2) An in-place feature vector generation algorithm with linear time complexity O(n) regardless of the granularity of sliding target window. (3) An in-memory caching scheme that significantly reduces the number of disk IOs to make large-scale learning practical. (4) Highly efficient data structures and sparse representations of models and data to enable fast model updates. We believe that our work makes significant contributions to solving large-scale machine learning problems of industrial relevance in general. Finally, we report comprehensive experimental results, using industrial proprietary codebase and datasets.