Improving k-nearest neighbors algorithms: practical application of dataset analysis

Authors:
Fidel Cacheda;Victor Carneiro;Diego Fernández;Vreixo Formoso
Affiliations:
University of A Coruña, A Coruña, Spain;University of A Coruña, A Coruña, Spain;University of A Coruña, A Coruña, Spain;University of A Coruña, A Coruña, Spain
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 4
Cited 0

An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms

Information Retrieval
Collaborative filtering with temporal dynamics

Communications of the ACM
Performance of recommender algorithms on top-n recommendation tasks

Proceedings of the fourth ACM conference on Recommender systems
Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last years, recommender systems have achieved a great popularity. Many different techniques have been developed and applied to this field. However, in many cases the algorithms do not obtain the expected results. In particular, when the applied model does not fit the real data the results are especially bad. This happens because many times models are directly applied to a domain without a previous analysis of the data. In this work we study the most popular datasets in the movie recommendation domain, in order to understand how the users behave in this particular context. We have found some remarkable facts that question the utility of the similarity measures traditionally used in k-Nearest Neighbors (kNN) algorithms. These findings can be useful in order to develop new algorithms. In particular, we modify traditional kNN algorithms by introducing a new similarity measure specially suited for sparse contexts, where users have rated very few items. Our experiments show slight improvements in prediction accuracy, which proves the importance of a thorough dataset analysis as a previous step to any algorithm development.