Clustering high dimensional data: A graph-based relaxed optimization approach
Information Sciences: an International Journal
Improving the speed and stability of the k-nearest neighbors method
Pattern Recognition Letters
Hi-index | 0.00 |
In the last decade, factorial and clustering techniques have been developed to analyze multidimensional interval data (MIDs). In classic data analysis, PCA and clustering of the most significant components are usually performed to extract cluster structure from data. The clustering of the projected data is then performed, once the noise is filtered out, in a subspace generated by few orthogonal variables. In the framework of interval data analysis, we propose the same strategy. Several computational questions arise from this generalization. First of all, the representation of data onto a factorial subspace: in classic data analysis projected points remain points, but projected MIDs do not remains MIDs. Further, the choice of a distance between the represented data: many distances between points can be computed, few distances between convex sets of points are defined. We here propose optimized techniques for representing data by convex shapes, for computing the Hausdorff distance between convex shapes, based on an L 2 norm, and for performing a hierarchical clustering of projected data.