Stability of density-based clustering

  • Authors:
  • Alessandro Rinaldo;Aarti Singh;Rebecca Nugent;Larry Wasserman

  • Affiliations:
  • Department of Statistics, Carnegie Mellon University, Pittsburgh, PA;Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA;Department of Statistics, Carnegie Mellon University, Pittsburgh, PA;Department of Statistics, Carnegie Mellon University, Pittsburgh, PA and Machine Learning Department

  • Venue:
  • The Journal of Machine Learning Research
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

High density clusters can be characterized by the connected components of a level set L(λ) = {x : p(x) λ} of the underlying probability density function p generating the data, at some appropriate level λ ≥ 0. The complete hierarchical clustering can be characterized by a cluster tree T = ∪λ L(λ). In this paper, we study the behavior of a density level set estimate L(λ) and cluster tree estimate T based on a kernel density estimator with kernel bandwidth h. We define two notions of instability to measure the variability of L(λ) and T as a function of h, and investigate the theoretical properties of these instability measures.