Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
Clustering Using a Similarity Measure Based on Shared Near Neighbors
IEEE Transactions on Computers
Image clustering based on a shared nearest neighbors approach for tagged collections
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Multimedia Data Mining: A Systematic Introduction to Concepts and Theory
Multimedia Data Mining: A Systematic Introduction to Concepts and Theory
Nearest neighbors in high-dimensional data: the emergence and influence of hubs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
When is 'nearest neighbour' meaningful: A converse theorem and implications
Journal of Complexity
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
The Journal of Machine Learning Research
Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors
MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
On the existence of obstinate results in vector space models
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Can shared-neighbor distances defeat the curse of dimensionality?
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
The role of hubness in clustering high-dimensional data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
INSIGHT: efficient and effective instance selection for time-series classification
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
On the Inequality of Cover and Hart in Nearest Neighbor Discrimination
IEEE Transactions on Pattern Analysis and Machine Intelligence
A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN
Proceedings of the 20th ACM international conference on Information and knowledge management
Nearest Neighbor Voting in High-Dimensional Data: Learning from Past Occurrences
ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
High-dimensional shared nearest neighbor clustering algorithm
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Hi-index | 0.00 |
Learning from high-dimensional data is usually quite a challenging task, as captured by the well known phrase curse of dimensionality. Most distance-based methods become impaired due to the distance concentration of many widely used metrics in high-dimensional spaces. One recently proposed approach suggests that using secondary distances based on the number of shared k-nearest neighbors between different points might partly resolve the concentration issue, thereby improving overall performance. Nevertheless, the curse of dimensionality also affects the k-nearest neighbor inference in severely negative ways, one such consequence being known as hubness. The impact of hubness on forming shared neighbor distances has not been discussed before and it is what we focus on in this paper. Furthermore, we propose a new method for calculating the secondary distances which is aware of the underlying neighbor occurrence distribution. Our experiments suggest that this new approach achieves consistently superior performance on all considered high-dimensional data sets. An additional benefit is that it essentially requires no extra computations compared to the original methods.