Linear manifold clustering in high dimensional spaces by stochastic search
Pattern Recognition
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Journal of Network and Computer Applications
Hi-index | 0.00 |
We define a cluster to be characterized by regions of high density separated by regions that are sparse. By observing the downward closure property of density, the search for interesting structure in a high dimensional space can be reduced to a search for structure in lower dimensional subspaces. We present a Hierarchical Projection Pursuit Clustering (HPPC) algorithm that repeatedly bi-partitions the dataset based on the discovered properties of interesting 1-dimensional projections. We describe a projection search procedure and a projection pursuit index function based on Cho, Haralick and Yi's improvement of the Kittler and Illingworth optimal threshold technique. The output of the algorithm is a decision tree whose nodes store a projection and threshold and whose leaves represent the clusters (classes). Experiments with various real and synthetic datasets show the effectiveness of the approach.