New ideas in optimization
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Proceedings of the 2004 ACM symposium on Applied computing
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Neighbor number, valley seeking and clustering
Pattern Recognition Letters
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Locally Scaled Density Based Clustering
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
When is 'nearest neighbour' meaningful: A converse theorem and implications
Journal of Complexity
Two graph-based algorithms for state-of-the-art WSD
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
The Journal of Machine Learning Research
Fast agglomerative clustering using information of k-nearest neighbors
Pattern Recognition
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
SOHAC: efficient storage of tick data that supports search and analysis
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Projective clustering ensembles
Data Mining and Knowledge Discovery
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Hi-index | 0.01 |
High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data-mining techniques, both in terms of effectiveness and efficiency. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper we take a novel perspective on the problem of clustering high-dimensional data. Instead of attempting to avoid the curse of dimensionality by observing a lower-dimensional feature subspace, we embrace dimensionality by taking advantage of some inherently high-dimensional phenomena. More specifically, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest neighbor lists of other points, can be successfully exploited in clustering. We validate our hypothesis by proposing several hubness-based clustering algorithms and testing them on high-dimensional data. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise.