Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

Authors:
Nenad Tomašev;Miloš Radovanović;Dunja Mladenić;Mirjana Ivanović
Affiliations:
Institute Jožef Stefan, Artificial Intelligence Laboratory, Ljubljana, Slovenia;University of Novi Sad, Department of Mathematics and Informatics, Novi Sad, Serbia;Institute Jožef Stefan, Artificial Intelligence Laboratory, Ljubljana, Slovenia;University of Novi Sad, Department of Mathematics and Informatics, Novi Sad, Serbia
Venue:
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Year:
2011

Citing 13
Cited 2

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Inference for the Generalization Error

Machine Learning
The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering
On kernel difference-weighted k-nearest neighbor classification

Pattern Analysis & Applications - Special Issue: Non-parametric distance-based classification techniques and their applications
Rough-fuzzy weighted k-nearest leader classifier for large data sets

Pattern Recognition
Nearest neighbors in high-dimensional data: the emergence and influence of hubs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
When is 'nearest neighbour' meaningful: A converse theorem and implications

Journal of Complexity
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

The Journal of Machine Learning Research
K-nearest neighbor search for fuzzy objects

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On the existence of obstinate results in vector space models

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

The Journal of Machine Learning Research
An optimally weighted fuzzy k-NN algorithm

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I

A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN

Proceedings of the 20th ACM international conference on Information and knowledge management
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II

Quantified Score

Hi-index	0.01

Visualization

Abstract

High-dimensional data are by their very nature often difficult to handle by conventional machine-learning algorithms, which is usually characterized as an aspect of the curse of dimensionality. However, it was shown that some of the arising high-dimensional phenomena can be exploited to increase algorithm accuracy. One such phenomenon is hubness, which refers to the emergence of hubs in high-dimensional spaces, where hubs are influential points included in many k-neighbor sets of other points in the data. This phenomenon was previously used to devise a crisp weighted voting scheme for the k-nearest neighbor classifier. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well the standard kNN classifier.