Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Using asymmetric distributions to improve text classifier probability estimates
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Tackling concept drift by temporal inductive transfer
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Quantifying trends accurately despite classifier error and class imbalance
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pragmatic text mining: minimizing human effort to quantify many issues in call logs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Bootstrap FDA for counting positives accurately in imprecise environments
Pattern Recognition
Quantifying counts and costs via classification
Data Mining and Knowledge Discovery
ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Quantifying the proportion of damaged sperm cells based on image analysis and neural networks
SMO'08 Proceedings of the 8th conference on Simulation, modelling and optimization
Quantification and semi-supervised classification methods for handling changes in class distribution
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint cutoff probabilistic estimation using simulation: a mailing campaign application
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Network quantification despite biased labels
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Smooth receiver operating characteristics (smROC) curves
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Handling concept drift via ensemble and class distribution estimation technique
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Class distribution estimation based on the Hellinger distance
Information Sciences: an International Journal
Aggregative quantification for regression
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Most supervised machine learning research assumes the training set is a random sample from the target population, thus the class distribution is invariant. In real world situations, however, the class distribution changes, and is known to erode the effectiveness of classifiers and calibrated probability estimators. This paper focuses on the problem of accurately estimating the number of positives in the test set—quantification—as opposed to classifying individual cases accuratel y. It compares three methods: classify & count, an adjusted variant, and a mixture model. An empirical evaluation on a text classification benchmark reveals that the simple method is consistently biased, and that the mixture model is surprisingly effective even when positives are very scarce in the training set—a common case in information retrieval.