Divergence-based classification in learning vector quantization

  • Authors:
  • E. Mwebaze;P. Schneider;F. -M. Schleif;J. R. Aduwo;J. A. Quinn;S. Haase;T. Villmann;M. Biehl

  • Affiliations:
  • Faculty of Computing & IT, Makerere University, P.O. Box 7062, Kampala, Uganda and Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen P.O. Box 407, 9700AK Gro ...;Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen P.O. Box 407, 9700AK Groningen, The Netherlands and School of Clinical & Experimental Medicine, University ...;CITEC, University of Bielefeld, Universitätsstr. 21-23, 33615 Bielefeld, Germany;Faculty of Computing & IT, Makerere University, P.O. Box 7062, Kampala, Uganda;Faculty of Computing & IT, Makerere University, P.O. Box 7062, Kampala, Uganda;Department of MPI, University of Applied Sciences, Technikumplatz 17, 09648 Mittweida, Germany;Department of MPI, University of Applied Sciences, Technikumplatz 17, 09648 Mittweida, Germany;Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen P.O. Box 407, 9700AK Groningen, The Netherlands

  • Venue:
  • Neurocomputing
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

We discuss the use of divergences in dissimilarity-based classification. Divergences can be employed whenever vectorial data consists of non-negative, potentially normalized features. This is, for instance, the case in spectral data or histograms. In particular, we introduce and study divergence based learning vector quantization (DLVQ). We derive cost function based DLVQ schemes for the family of @c@?divergences which includes the well-known Kullback-Leibler divergence and the so-called Cauchy-Schwarz divergence as special cases. The corresponding training schemes are applied to two different real world data sets. The first one, a benchmark data set (Wisconsin Breast Cancer) is available in the public domain. In the second problem, color histograms of leaf images are used to detect the presence of cassava mosaic disease in cassava plants. We compare the use of standard Euclidean distances with DLVQ for different parameter settings. We show that DLVQ can yield superior classification accuracies and Receiver Operating Characteristics.