C4.5: programs for machine learning
C4.5: programs for machine learning
Robust Classification for Imprecise Environments
Machine Learning
Learning and making decisions when costs and probabilities are both unknown
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Minimax Regret Classifier for Imprecise Class Distributions
The Journal of Machine Learning Research
Non-stationary data sequence classification using online class priors estimation
Pattern Recognition
Quantifying counts and costs via classification
Data Mining and Knowledge Discovery
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Assessing the impact of changing environments on classifier performance
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Counting positives accurately despite inaccurate classification
ECML'05 Proceedings of the 16th European conference on Machine Learning
Network quantification despite biased labels
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Estimating class proportions in boar semen analysis using the hellinger distance
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Handling concept drift via ensemble and class distribution estimation technique
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Class distribution estimation based on the Hellinger distance
Information Sciences: an International Journal
Variable-constraint classification and quantification of radiology reports under the ACR Index
Expert Systems with Applications: An International Journal
Aggregative quantification for regression
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
In realistic settings the prevalence of a class may change after a classifier is induced and this will degrade the performance of the classifier. Further complicating this scenario is the fact that labeled data is often scarce and expensive. In this paper we address the problem where the class distribution changes and only unlabeled examples are available from the new distribution. We design and evaluate a number of methods for coping with this problem and compare the performance of these methods. Our quantification-based methods estimate the class distribution of the unlabeled data from the changed distribution and adjust the original classifier accordingly, while our semi-supervised methods build a new classifier using the examples from the new (unlabeled) distribution which are supplemented with predicted class values. We also introduce a hybrid method that utilizes both quantification and semi-supervised learning. All methods are evaluated using accuracy and F-measure on a set of benchmark data sets. Our results demonstrate that our methods yield substantial improvements in accuracy and F-measure.