Aggregative quantification for regression

Authors:
Antonio Bella;Cèsar Ferri;José Hernández-Orallo;María José Ramírez-Quintana
Affiliations:
DSIC-ELP, Universitat Politècnica de València, Valencia, Spain 46022;DSIC-ELP, Universitat Politècnica de València, Valencia, Spain 46022;DSIC-ELP, Universitat Politècnica de València, Valencia, Spain 46022;DSIC-ELP, Universitat Politècnica de València, Valencia, Spain 46022
Venue:
Data Mining and Knowledge Discovery
Year:
2014

Citing 22
Cited 0

Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Quantifying trends accurately despite classifier error and class imbalance

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Estimating class priors in domain adaptation for word sense disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
The L1-version of the Cramér-von mises test for two-sample comparisons in microarray data analysis

EURASIP Journal on Bioinformatics and Systems Biology
Quantifying counts and costs via classification

Data Mining and Knowledge Discovery
Classification and Quantification Based on Image Analysis for Sperm Samples with Uncertain Damaged/Intact Cell Proportions

ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
An experimental comparison of performance measures for classification

Pattern Recognition Letters
Quantification and semi-supervised classification methods for handling changes in class distribution

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity-binning averaging: a generalisation of binning calibration

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Quantification via Probability Estimators

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
A unifying view on dataset shift in classification

Pattern Recognition
Counting positives accurately despite inaccurate classification

ECML'05 Proceedings of the 16th European conference on Machine Learning
Nonparametric multivariate density estimation: a comparative study

IEEE Transactions on Signal Processing
Class distribution estimation based on the Hellinger distance

Information Sciences: an International Journal
Machine Learning: The Art and Science of Algorithms that Make Sense of Data

Machine Learning: The Art and Science of Algorithms that Make Sense of Data
A unified view of performance metrics: translating threshold choice into expected classification loss

The Journal of Machine Learning Research
On the effect of calibration in classifier combination

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of estimating the class distribution (or prevalence) for a new unlabelled dataset (from a possibly different distribution) is a very common problem which has been addressed in one way or another in the past decades. This problem has been recently reconsidered as a new task in data mining, renamed quantification when the estimation is performed as an aggregation (and possible adjustment) of a single-instance supervised model (e.g., a classifier). However, the study of quantification has been limited to classification, while it is clear that this problem also appears, perhaps even more frequently, with other predictive problems, such as regression. In this case, the goal is to determine a distribution or an aggregated indicator of the output variable for a new unlabelled dataset. In this paper, we introduce a comprehensive new taxonomy of quantification tasks, distinguishing between the estimation of the whole distribution and the estimation of some indicators (summary statistics), for both classification and regression. This distinction is especially useful for regression, since predictions are numerical values that can be aggregated in many different ways, as in multi-dimensional hierarchical data warehouses. We focus on aggregative quantification for regression and see that the approaches borrowed from classification do not work. We present several techniques based on segmentation which are able to produce accurate estimations of the expected value and the distribution of the output variable. We show experimentally that these methods especially excel for the relevant scenarios where training and test distributions dramatically differ.