Class distribution estimation based on the Hellinger distance

  • Authors:
  • VíCtor GonzáLez-Castro;RocíO Alaiz-RodríGuez;Enrique Alegre

  • Affiliations:
  • Dpto. de Ingeniería Eléctrica y de Sistemas y Automática, University of León, Campus de Vegazana s/n, 24071 León, Spain;Dpto. de Ingeniería Eléctrica y de Sistemas y Automática, University of León, Campus de Vegazana s/n, 24071 León, Spain;Dpto. de Ingeniería Eléctrica y de Sistemas y Automática, University of León, Campus de Vegazana s/n, 24071 León, Spain

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Class distribution estimation (quantification) plays an important role in many practical classification problems. Firstly, it is important in order to adapt the classifier to the operational conditions when they differ from those assumed in learning. Additionally, there are some real domains where the quantification task is itself valuable due to the high variability of the class prior probabilities. Our novel quantification approach for two-class problems is based on distributional divergence measures. The mismatch between the test data distribution and validation distributions generated in a fully controlled way is measured by the Hellinger distance in order to estimate the prior probability that minimizes this divergence. Experimental results on several binary classification problems show the benefits of this approach when compared to such approaches as counting the predicted class labels and other methods based on the classifier confusion matrix or on posterior probability estimations. We also illustrate these techniques as well as their robustness against the base classifier performance (a neural network) with a boar semen quality control setting. Empirical results show that the quantification can be conducted with a mean absolute error lower than 0.008, which seems very promising in this field.