Adaptive local dissimilarity measures for discriminative dimension reduction of labeled data

  • Authors:
  • Kerstin Bunte;Barbara Hammer;Axel Wismüller;Michael Biehl

  • Affiliations:
  • University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science, P.O. Box 407, 9700 AK Groningen, The Netherlands;Bielefeld University, CITEC, Universitätsstraíe 23, 33615 Bielefeld, Germany;Department of Radiology, University of Rochester, 601 Elmwood Avenue, Rochester, NY 14642-648, USA and Department of Biomedical Engineering, University of Rochester, 601 Elmwood Avenue, Rochester, ...;University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science, P.O. Box 407, 9700 AK Groningen, The Netherlands

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Due to the tremendous increase of electronic information with respect to the size of data sets as well as their dimension, dimension reduction and visualization of high-dimensional data has become one of the key problems of data mining. Since embedding in lower dimensions necessarily includes a loss of information, methods to explicitly control the information kept by a specific dimension reduction technique are highly desirable. The incorporation of supervised class information constitutes an important specific case. The aim is to preserve and potentially enhance the discrimination of classes in lower dimensions. In this contribution we use an extension of prototype-based local distance learning, which results in a nonlinear discriminative dissimilarity measure for a given labeled data manifold. The learned local distance measure can be used as basis for other unsupervised dimension reduction techniques, which take into account neighborhood information. We show the combination of different dimension reduction techniques with a discriminative similarity measure learned by an extension of learning vector quantization (LVQ) and their behavior with different parameter settings. The methods are introduced and discussed in terms of artificial and real world data sets.