Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification

  • Authors:
  • Nenad Tomašev;Dunja Mladenić

  • Affiliations:
  • Artificial Intelligence Laboratory, Institute Jožef Stefan, Ljubljana, Slovenia;Artificial Intelligence Laboratory, Institute Jožef Stefan, Ljubljana, Slovenia

  • Venue:
  • HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Learning from high-dimensional data is usually quite a challenging task, as captured by the well known phrase curse of dimensionality. Most distance-based methods become impaired due to the distance concentration of many widely used metrics in high-dimensional spaces. One recently proposed approach suggests that using secondary distances based on the number of shared k-nearest neighbors between different points might partly resolve the concentration issue, thereby improving overall performance. Nevertheless, the curse of dimensionality also affects the k-nearest neighbor inference in severely negative ways, one such consequence being known as hubness. The impact of hubness on forming shared neighbor distances has not been discussed before and it is what we focus on in this paper. Furthermore, we propose a new method for calculating the secondary distances which is aware of the underlying neighbor occurrence distribution. Our experiments suggest that this new approach achieves consistently superior performance on all considered high-dimensional data sets. An additional benefit is that it essentially requires no extra computations compared to the original methods.