Clusters, outliers, and regression: fixed point clusters

  • Authors:
  • Christian Hennig

  • Affiliations:
  • Fachbereich Mathematik - SPST, Universität Hamburg, Bundesstraße 55, D-20146 Hamburg, Germany and Seminar für Statistik, ETH Zentrum, CH-8092 Zürich, Switzerland

  • Venue:
  • Journal of Multivariate Analysis
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fixed point clustering is a new stochastic approach to cluster analysis. The definition of a single fixed point cluster (FPC) is based on a simple parametric model, but there is no parametric assumption for the whole dataset as opposed to mixture modeling and other approaches. An FPC is defined as a data subset that is exactly the set of non-outliers with respect to its own parameter estimators. This paper concentrates upon the theoretical foundation of FPC analysis as a method for clusterwise linear regression, i.e., the single clusters are modeled as linear regressions with normal errors. In this setup, fixed point clustering is based on an iteratively reweighted estimation with zero weight for all outliers. FPCs are non-hierarchical, but they may overlap and include each other. A specification of the number of clusters is not needed. Consistency results are given for certain mixture models of interest in cluster analysis. Convergence of a fixed point algorithm is shown. Application to a real dataset shows that fixed point clustering can highlight some other interesting features of datasets compared to maximum likelihood methods in the presence of deviations from the usual assumptions of model based cluster analysis.