Class distribution estimation based on the Hellinger distance

Authors:
VíCtor GonzáLez-Castro;RocíO Alaiz-RodríGuez;Enrique Alegre
Affiliations:
Dpto. de Ingeniería Eléctrica y de Sistemas y Automática, University of León, Campus de Vegazana s/n, 24071 León, Spain;Dpto. de Ingeniería Eléctrica y de Sistemas y Automática, University of León, Campus de Vegazana s/n, 24071 León, Spain;Dpto. de Ingeniería Eléctrica y de Sistemas y Automática, University of León, Campus de Vegazana s/n, 24071 León, Spain
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 32
Cited 2

Neural networks for pattern recognition

Neural networks for pattern recognition
Robust Classification for Imprecise Environments

Machine Learning
Guide to Neural Computing Applications

Guide to Neural Computing Applications
Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure

Neural Computation
Classification on Data with Biased Class Distribution

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Texture classification using wavelet transform

Pattern Recognition Letters
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Quantifying trends accurately despite classifier error and class imbalance

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pragmatic text mining: minimizing human effort to quantify many issues in call logs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Cost curves: An improved method for visualizing classifier performance

Machine Learning
Information theory and statistics: a tutorial

Communications and Information Theory
Biostatistical Analysis (5th Edition)

Biostatistical Analysis (5th Edition)
Estimating class priors in domain adaptation for word sense disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
A weighted rough set based method developed for class imbalance learning

Information Sciences: an International Journal
An information granulation based data mining approach for classifying imbalanced data

Information Sciences: an International Journal
Quantifying counts and costs via classification

Data Mining and Knowledge Discovery
Classification and Quantification Based on Image Analysis for Sperm Samples with Uncertain Damaged/Intact Cell Proportions

ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
A framework for monitoring classifiers’ performance: when and why failure occurs?

Knowledge and Information Systems
Semi-supervised kernel density estimation for video annotation

Computer Vision and Image Understanding
Quantifying the proportion of damaged sperm cells based on image analysis and neural networks

SMO'08 Proceedings of the 8th conference on Simulation, modelling and optimization
Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets

International Journal of Approximate Reasoning
Quantification and semi-supervised classification methods for handling changes in class distribution

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

Information Sciences: an International Journal
The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift

IEEE Transactions on Knowledge and Data Engineering
Assessing the impact of changing environments on classifier performance

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Quantification via Probability Estimators

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Estimating class proportions in boar semen analysis using the hellinger distance

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
A unifying view on dataset shift in classification

Pattern Recognition
Counting positives accurately despite inaccurate classification

ECML'05 Proceedings of the 16th European conference on Machine Learning
Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis

Information Sciences: an International Journal

Vitality assessment of boar sperm using an adaptive LBP based on oriented deviation

ACCV'12 Proceedings of the 11th international conference on Computer Vision - Volume Part I
Aggregative quantification for regression

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.07

Visualization

Abstract

Class distribution estimation (quantification) plays an important role in many practical classification problems. Firstly, it is important in order to adapt the classifier to the operational conditions when they differ from those assumed in learning. Additionally, there are some real domains where the quantification task is itself valuable due to the high variability of the class prior probabilities. Our novel quantification approach for two-class problems is based on distributional divergence measures. The mismatch between the test data distribution and validation distributions generated in a fully controlled way is measured by the Hellinger distance in order to estimate the prior probability that minimizes this divergence. Experimental results on several binary classification problems show the benefits of this approach when compared to such approaches as counting the predicted class labels and other methods based on the classifier confusion matrix or on posterior probability estimations. We also illustrate these techniques as well as their robustness against the base classifier performance (a neural network) with a boar semen quality control setting. Empirical results show that the quantification can be conducted with a mean absolute error lower than 0.008, which seems very promising in this field.