Machine Learning
Top-Down Induction of Clustering Trees
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Knowledge Discovery in Multi-label Phenotype Data
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Large Margin Methods for Structured and Interdependent Output Variables
The Journal of Machine Learning Research
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
Incremental Algorithms for Hierarchical Classification
The Journal of Machine Learning Research
Kernel-Based Learning of Hierarchical Multilabel Classification Models
The Journal of Machine Learning Research
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Keyword Generation for Search Engine Advertising
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Model-shared subspace boosting for multi-label classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Keyword generation for search engine advertising using semantic similarity between terms
Proceedings of the ninth international conference on Electronic commerce
Keyword extraction for contextual advertisement
Proceedings of the 17th international conference on World Wide Web
Optimizing query rewrites for keyword-based advertising
Proceedings of the 9th ACM conference on Electronic commerce
Random k-Labelsets: An Ensemble Method for Multilabel Classification
ECML '07 Proceedings of the 18th European conference on Machine Learning
Ensembles of Multi-Objective Decision Trees
ECML '07 Proceedings of the 18th European conference on Machine Learning
Simrank++: query rewriting through link analysis of the click graph
Proceedings of the VLDB Endowment
One-Class Collaborative Filtering
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Large scale multi-label classification via metalabeler
Proceedings of the 18th international conference on World wide web
Online expansion of rare queries for sponsored search
Proceedings of the 18th international conference on World wide web
Feature selection for multi-label naive Bayes classification
Information Sciences: an International Journal
Semi-supervised multi-label learning by constrained non-negative matrix factorization
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
Automatic generation of bid phrases for online advertising
Proceedings of the third ACM international conference on Web search and data mining
Using landing pages for sponsored search ad selection
Proceedings of the 19th international conference on World wide web
Conditional probability tree estimation analysis and algorithms
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
One-Class Matrix Completion with Low-Density Factorizations
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Sparse Semi-supervised Learning Using Conjugate Functions
The Journal of Machine Learning Research
A Survey of Automatic Query Expansion in Information Retrieval
ACM Computing Surveys (CSUR)
Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints
SIAM Journal on Optimization
Hi-index | 0.00 |
Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on making recommendations from a small set of phrases extracted (and expanded) from the page using NLP and ranking based techniques. In this paper, we eschew this paradigm, and demonstrate that it is possible to efficiently predict the relevant subset of queries from a large set of monetizable ones by posing the problem as a multi-label learning task with each query being represented by a separate label. We develop Multi-label Random Forests to tackle problems with millions of labels. Our proposed classifier has prediction costs that are logarithmic in the number of labels and can make predictions in a few milliseconds using 10 Gb of RAM. We demonstrate that it is possible to generate training data for our classifier automatically from click logs without any human annotation or intervention. We train our classifier on tens of millions of labels, features and training points in less than two days on a thousand node cluster. We develop a sparse semi-supervised multi-label learning formulation to deal with training set biases and noisy labels harvested automatically from the click logs. This formulation is used to infer a belief in the state of each label for each training ad and the random forest classifier is extended to train on these beliefs rather than the given labels. Experiments reveal significant gains over ranking and NLP based techniques on a large test set of 5 million ads using multiple metrics.