Distances in Classification

Authors:
Claus Weihs;Gero Szepannek
Affiliations:
Department of Statistics, University of Dortmund, Dortmund, 44227;Department of Statistics, University of Dortmund, Dortmund, 44227
Venue:
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Year:
2009

Citing 3
Cited 0

Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Local Modelling in Classification

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Data mining on multimedia data

Data mining on multimedia data

Quantified Score

Hi-index	0.00

Visualization

Abstract

The notion of distance is the most important basis for classification. This is especially true for unsupervised learning, i.e. clustering, since there is no validation mechanism by means of objects of known groups. But also for supervised learning standard distances often do not lead to appropriate results. For every individual problem the adequate distance is to be decided upon. This is demonstrated by means of three practical examples from very different application areas, namely social science, music science, and production economics. In social science, clustering is applied to spatial regions with very irregular borders. Then adequate spatial distances may have to be taken into account for clustering. In statistical musicology the main problem is often to find an adequate transformation of the input time series as an adequate basis for distance definition. Also, local modelling is proposed in order to account for different subpopulations, e.g. instruments. In production economics often many quality criteria have to be taken into account with very different scaling. In order to find a compromise optimum classification, this leads to a pre-transformation onto the same scale, called desirability.