Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Database-friendly random projections
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning mixtures of arbitrary gaussians
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Principal Direction Divisive Partitioning
Data Mining and Knowledge Discovery
Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning
Information Retrieval
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Improved Fast Gauss Transform and Efficient Kernel Density Estimation
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Introduction to Clustering Large and High-Dimensional Data
Introduction to Clustering Large and High-Dimensional Data
Enhancing principal direction divisive clustering
Pattern Recognition
A random-sampling-based algorithm for learning intersections of halfspaces
Journal of the ACM (JACM)
Experiments with random projection
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 0.10 |
Projection methods for dimension reduction have enabled the discovery of otherwise unattainable structure in ultra high dimensional data. More recently, a particular method, namely Random Projection, has been shown to have the advantage of high quality data representations with minimal computation effort, even for data dimensions in the range of hundreds of thousands or even millions. Here, we couple this dimension reduction technique with data clustering algorithms that are specially designed for high dimensional cases. First, we show that the theoretical properties of both components can be combined in a sound manner, promising an effective clustering framework. Indeed, for a series of simulated and real ultra high dimensional data scenarios, as the experimental analysis shows, the resulting algorithms achieve high quality data partitions, orders of magnitude faster.