The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering

  • Authors:
  • Fionn Murtagh

  • Affiliations:
  • Science Foundation Ireland, Wilton Park House, Wilton Place, Dublin 4, Ireland and University of London, Department of Computer Science, Royal Holloway, Egham, TW20 0EX, England

  • Venue:
  • Journal of Classification
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling.