Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach
Data Mining and Knowledge Discovery
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Validation indices for graph clustering
Pattern Recognition Letters - Special issue: Graph-based representations in pattern recognition
Building a Classification Cascade for Visual Identification from One Example
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
Scalable music recommendation by search
Proceedings of the 15th international conference on Multimedia
Clustering Using a Similarity Measure Based on Shared Near Neighbors
IEEE Transactions on Computers
Accurate Image Search Using the Contextual Dissimilarity Measure
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Ranking outliers using symmetric neighborhood relationship
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Analysis of Minimum Distances in High-Dimensional Musical Spaces
IEEE Transactions on Audio, Speech, and Language Processing
Some new indexes of cluster validity
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Support vector machines for histogram-based image classification
IEEE Transactions on Neural Networks
Hybrid retrieval approaches to geospatial music recommendation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Classification accuracy is not enough
Journal of Intelligent Information Systems
Hi-index | 0.00 |
'Hubness' has recently been identified as a general problem of high dimensional data spaces, manifesting itself in the emergence of objects, so-called hubs, which tend to be among the k nearest neighbors of a large number of data items. As a consequence many nearest neighbor relations in the distance space are asymmetric, that is, object y is amongst the nearest neighbors of x but not vice versa. The work presented here discusses two classes of methods that try to symmetrize nearest neighbor relations and investigates to what extent they can mitigate the negative effects of hubs. We evaluate local distance scaling and propose a global variant which has the advantage of being easy to approximate for large data sets and of having a probabilistic interpretation. Both local and global approaches are shown to be effective especially for high-dimensional data sets, which are affected by high hubness. Both methods lead to a strong decrease of hubness in these data sets, while at the same time improving properties like classification accuracy. We evaluate the methods on a large number of public machine learning data sets and synthetic data. Finally we present a real-world application where we are able to achieve significantly higher retrieval quality.