Encyclopedic dictionary of mathematics (2nd ed.)
Encyclopedic dictionary of mathematics (2nd ed.)
Neural networks for pattern recognition
Neural networks for pattern recognition
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Independent component analysis: algorithms and applications
Neural Networks
Machine Learning
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Self-Organizing Maps
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
Supervised dimension reduction of intrinsically low-dimensional data
Neural Computation
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
High dimensional reverse nearest neighbor queries
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Improvements to Platt's SMO Algorithm for SVM Classifier Design
Neural Computation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Multidimensional reverse kNN search
The VLDB Journal — The International Journal on Very Large Data Bases
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
Enhanced 1-NN time series classification using badness of records
Proceedings of the 2nd international conference on Ubiquitous information management and communication
An empirical evaluation of supervised learning in high dimensions
Proceedings of the 25th international conference on Machine learning
On the Design and Applicability of Distance Functions in High-Dimensional Data Space
IEEE Transactions on Knowledge and Data Engineering
Graph construction and b-matching for semi-supervised learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Nearest neighbors in high-dimensional data: the emergence and influence of hubs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
When is 'nearest neighbour' meaningful: A converse theorem and implications
Journal of Complexity
Distance Metric Learning for Large Margin Nearest Neighbor Classification
The Journal of Machine Learning Research
The Journal of Machine Learning Research
How does high dimensionality affect collaborative filtering?
Proceedings of the third ACM conference on Recommender systems
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
The Journal of Machine Learning Research
On the existence of obstinate results in vector space models
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document representations for classification of short web-page descriptions
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Supervised nonlinear dimensionality reduction for visualization and classification
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
The role of hubness in clustering high-dimensional data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Quality of similarity rankings in time series
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN
Proceedings of the 20th ACM international conference on Information and knowledge management
Non-parametric detection of meaningless distances in high dimensional data
Statistics and Computing
A probabilistic approach for semi-supervised nearest neighbor classification
Pattern Recognition Letters
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Visualizing the quality of dimensionality reduction
Neurocomputing
Local and global scaling reduce hubs in space
The Journal of Machine Learning Research
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
On the mutual nearest neighbors estimate in regression
The Journal of Machine Learning Research
Local Mutual Information for Dissimilarity-Based Image Segmentation
Journal of Mathematical Imaging and Vision
Hi-index | 0.00 |
Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent "popular" nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families.