Improving k-nearest neighbors algorithms: practical application of dataset analysis

  • Authors:
  • Fidel Cacheda;Victor Carneiro;Diego Fernández;Vreixo Formoso

  • Affiliations:
  • University of A Coruña, A Coruña, Spain;University of A Coruña, A Coruña, Spain;University of A Coruña, A Coruña, Spain;University of A Coruña, A Coruña, Spain

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last years, recommender systems have achieved a great popularity. Many different techniques have been developed and applied to this field. However, in many cases the algorithms do not obtain the expected results. In particular, when the applied model does not fit the real data the results are especially bad. This happens because many times models are directly applied to a domain without a previous analysis of the data. In this work we study the most popular datasets in the movie recommendation domain, in order to understand how the users behave in this particular context. We have found some remarkable facts that question the utility of the similarity measures traditionally used in k-Nearest Neighbors (kNN) algorithms. These findings can be useful in order to develop new algorithms. In particular, we modify traditional kNN algorithms by introducing a new similarity measure specially suited for sparse contexts, where users have rated very few items. Our experiments show slight improvements in prediction accuracy, which proves the importance of a thorough dataset analysis as a previous step to any algorithm development.