High dimensional reverse nearest neighbor queries
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
An empirical evaluation of supervised learning in high dimensions
Proceedings of the 25th international conference on Machine learning
How does high dimensionality affect collaborative filtering?
Proceedings of the third ACM conference on Recommender systems
On the limitations of browsing top-N recommender systems
Proceedings of the third ACM conference on Recommender systems
On the existence of obstinate results in vector space models
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
INSIGHT: efficient and effective instance selection for time-series classification
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Hi-index | 0.00 |
High dimensionality can pose severe difficulties, widely recognized as different aspects of the curse of dimensionality. In this paper we study a new aspect of the curse pertaining to the distribution of k-occurrences, i.e., the number of times a point appears among the k nearest neighbors of other points in a data set. We show that, as dimensionality increases, this distribution becomes considerably skewed and hub points emerge (points with very high k-occurrences). We examine the origin of this phenomenon, showing that it is an inherent property of high-dimensional vector space, and explore its influence on applications based on measuring distances in vector spaces, notably classification, clustering, and information retrieval.