Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

  • Authors:
  • Nenad Tomašev;Miloš Radovanović;Dunja Mladenić;Mirjana Ivanović

  • Affiliations:
  • Institute Jožef Stefan, Artificial Intelligence Laboratory, Ljubljana, Slovenia;University of Novi Sad, Department of Mathematics and Informatics, Novi Sad, Serbia;Institute Jožef Stefan, Artificial Intelligence Laboratory, Ljubljana, Slovenia;University of Novi Sad, Department of Mathematics and Informatics, Novi Sad, Serbia

  • Venue:
  • MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

High-dimensional data are by their very nature often difficult to handle by conventional machine-learning algorithms, which is usually characterized as an aspect of the curse of dimensionality. However, it was shown that some of the arising high-dimensional phenomena can be exploited to increase algorithm accuracy. One such phenomenon is hubness, which refers to the emergence of hubs in high-dimensional spaces, where hubs are influential points included in many k-neighbor sets of other points in the data. This phenomenon was previously used to devise a crisp weighted voting scheme for the k-nearest neighbor classifier. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well the standard kNN classifier.