Validating visual clusters in large datasets: fixed point clusters of spectral features

Authors:
Christian Hennig;Norbert Christlieb
Affiliations:
ETH Zürich (LEO), Seminar für Statistik, Zürich, Switzerland and Fachbereich Mathematik-SPST, Universität Hamburg, Hamburg, Germany;Universität Hamburg, Hamburger Sternwarte, Hamburg, Germany
Venue:
Computational Statistics & Data Analysis
Year:
2002

Citing 4
Cited 3

The grand tour: a tool for viewing multidimensional data

SIAM Journal on Scientific and Statistical Computing
Grand tour methods: an outline

Proceedings of the Seventeenth Symposium on the interface of computer sciences and statistics on Computer science and statistics
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
A procedure for the detection of multivariate outliers

Computational Statistics & Data Analysis

Clusters, outliers, and regression: fixed point clusters

Journal of Multivariate Analysis
The importance of the scales in heterogeneous robust clustering

Computational Statistics & Data Analysis
Exploring the number of groups in robust model-based clustering

Statistics and Computing

Quantified Score

Hi-index	0.03

Visualization

Abstract

Finding clusters in large datasets is a difficult task. Almost all computationally feasible methods are related to k-means and need a clear partition structure of the data, while most such datasets contain masking outliers and other deviations from the usual models of partitioning cluster analysis. It is possible to look for clusters informally using graphic tools like the grand tour, but the meaning and the validity of such patterns is unclear. In this paper, a three-step-approach is suggested: In the first step, data visualization methods like the grand tour are used to find cluster candidate subsets of the data. In the second step, reproducible clusters are generated from them by means of fixed point clustering, a method to find a single cluster at a time based on the Mahalanobis distance. In the third step, the validity of the clusters is assessed by the use of classification plots. The approach is applied to an astronomical dataset of spectra from the Hamburg/ESO survey.