Nearest neighbors in high-dimensional data: the emergence and influence of hubs

Authors:
Miloš Radovanović;Alexandros Nanopoulos;Mirjana Ivanović
Affiliations:
University of Novi Sad, Novi Sad, Serbia;University of Hildesheim, Hildesheim, Germany;University of Novi Sad, Novi Sad, Serbia
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 6
Cited 11

High dimensional reverse nearest neighbor queries

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering
A scale-free distribution of false positives for a large class of audio similarity measures

Pattern Recognition
An empirical evaluation of supervised learning in high dimensions

Proceedings of the 25th international conference on Machine learning

How does high dimensionality affect collaborative filtering?

Proceedings of the third ACM conference on Recommender systems
On the limitations of browsing top-N recommender systems

Proceedings of the third ACM conference on Recommender systems
On the existence of obstinate results in vector space models

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

The Journal of Machine Learning Research
Improved learning of I2C distance and accelerating the neighborhood search for image classification

Pattern Recognition
INSIGHT: efficient and effective instance selection for time-series classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Instance selection for time series classification based on immune binary particle swarm optimization

Knowledge-Based Systems
Class imbalance and the curse of minority hubs

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

High dimensionality can pose severe difficulties, widely recognized as different aspects of the curse of dimensionality. In this paper we study a new aspect of the curse pertaining to the distribution of k-occurrences, i.e., the number of times a point appears among the k nearest neighbors of other points in a data set. We show that, as dimensionality increases, this distribution becomes considerably skewed and hub points emerge (points with very high k-occurrences). We examine the origin of this phenomenon, showing that it is an inherent property of high-dimensional vector space, and explore its influence on applications based on measuring distances in vector spaces, notably classification, clustering, and information retrieval.