C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Predicting rare classes: can boosting make any weak learner strong?
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Transforming classifier scores into accurate multiclass probability estimates
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
The Journal of Machine Learning Research
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning on the border: active learning in imbalanced data classification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Just-in-time contextual advertising
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A dual coordinate descent method for large-scale linear SVM
Proceedings of the 25th international conference on Machine learning
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Context transfer in search advertising
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Sparse Online Learning via Truncated Gradient
The Journal of Machine Learning Research
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Model selection under covariate shift
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining link and content for collective active learning
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Batch query processing for web search engines
Proceedings of the fourth ACM international conference on Web search and data mining
A framework for collective intelligence from internet Q&A documents
International Journal of Web and Grid Services
Categorization of display ads using image and landing page features
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Batch Mode Active Learning for Networked Data
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
Many web applications such as ad matching systems, vertical search engines, and page categorization systems require the identification of a particular type or class of pages on the Web. The sheer number and diversity of the pages on the Web, however, makes the problem of obtaining a good sample of the class of interest hard. In this paper, we describe a successfully deployed end-to-end system that starts from a biased training sample and makes use of several state-of-the-art machine learning algorithms working in tandem, including a powerful active learning component, in order to achieve a good classification system. The system is evaluated on traffic from a real-world ad-matching platform and is shown to achieve high categorization effectiveness with a significant reduction in editorial effort and labeling time.