A comparison of several predictive algorithms for collaborative filtering on multi-valued ratings

  • Authors:
  • Maritza L. Calderón-Benavides;Cristina N. González-Caro;José de J. Pérez-Alcázar;Juan C. García-Díaz;Joaquin Delgado

  • Affiliations:
  • Universidad Autónoma de, Bucaramanga, Bucaramanga, Colombia;Universidad Autónoma de, Bucaramanga, Bucaramanga, Colombia;Universidad Autónoma de, Bucaramanga, Bucaramanga, Colombia;Universidad Autónoma de, Bucaramanga, Bucaramanga, Colombia;TripleHop Technologies, New York, NY

  • Venue:
  • Proceedings of the 2004 ACM symposium on Applied computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The basic objective of a predictive algorithm for collaborative filtering (CF) is to suggest items to a particular user based on his/her preferences and other users with similar interests. Many algorithms have been proposed for CF, and some works comparing sub-sets of them can be found in the literature; however, more comprehensive comparisons are not available. In this work, a meaningful sample of CF algorithms widely reported in the literature were chosen for analysis; they represent different stages in the evolutive process of CF, starting from simple user correlations, going through online learning, up to methods which use classification techniques. Our main purpose is to compare these algorithms when applied on multi-valued ratings.Experiments were conducted on three well-known datasets with different characteristics, using two protocols and four evaluation metrics, representing coverage, accuracy, reliability and agreement of predictions with respect to real values. Results from such experiments showed that the memory-based method is a good option because its results are more precise and reliable compared with the other methods. Online Learning methods exhibit a good level of accuracy with low variation, which makes them reliable models. On the other hand, Support Vector Machines generate predictions with acceptable agreement; however, their accuracy depends on the characteristics of the input data. Finally, Dependency Networks did not offer good results when applied on multi-valued rankings. The run experiments confirm that the characteristics of datasets keep being an important factor in the performance of methods.