Clustering reduced interval data using Hausdorff distance

  • Authors:
  • Antonio Irpino;Valentino Tontodonato

  • Affiliations:
  • Dipartimento di Strategie Aziendali e Metodi Quantitativi, Seconda Universitá degli Studi di Napoli, Capua, Italy;Dipartimento di Strategie Aziendali e Metodi Quantitativi, Seconda Universitá degli Studi di Napoli, Capua, Italy

  • Venue:
  • Computational Statistics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last decade, factorial and clustering techniques have been developed to analyze multidimensional interval data (MIDs). In classic data analysis, PCA and clustering of the most significant components are usually performed to extract cluster structure from data. The clustering of the projected data is then performed, once the noise is filtered out, in a subspace generated by few orthogonal variables. In the framework of interval data analysis, we propose the same strategy. Several computational questions arise from this generalization. First of all, the representation of data onto a factorial subspace: in classic data analysis projected points remain points, but projected MIDs do not remains MIDs. Further, the choice of a distance between the represented data: many distances between points can be computed, few distances between convex sets of points are defined. We here propose optimized techniques for representing data by convex shapes, for computing the Hausdorff distance between convex shapes, based on an L 2 norm, and for performing a hierarchical clustering of projected data.