How to quantitatively compare data dissimilarities for unsupervised machine learning?

  • Authors:
  • Bassam Mokbel;Sebastian Gross;Markus Lux;Niels Pinkwart;Barbara Hammer

  • Affiliations:
  • CITEC Centre of Excellence, Bielefeld University, Germany;Computer Science Institute, Clausthal University of Technology, Germany;CITEC Centre of Excellence, Bielefeld University, Germany;Computer Science Institute, Clausthal University of Technology, Germany;CITEC Centre of Excellence, Bielefeld University, Germany

  • Venue:
  • ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

For complex data sets, the pairwise similarity or dissimilarity of data often serves as the interface of the application scenario to the machine learning tool. Hence, the final result of training is severely influenced by the choice of the dissimilarity measure. While dissimilarity measures for supervised settings can eventually be compared by the classification error, the situation is less clear in unsupervised domains where a clear objective is lacking. The question occurs, how to compare dissimilarity measures and their influence on the final result in such cases. In this contribution, we propose to use a recent quantitative measure introduced in the context of unsupervised dimensionality reduction, to compare whether and on which scale dissimilarities coincide for an unsupervised learning task. Essentially, the measure evaluates in how far neighborhood relations are preserved if evaluated based on rankings, this way achieving a robustness of the measure against scaling of data. Apart from a global comparison, local versions allow to highlight regions of the data where two dissimilarity measures induce the same results.