Evaluation of recommendations: rating-prediction and ranking

Authors:
Harald Steck
Affiliations:
Netflix Inc., Los Gatos, CA, USA
Venue:
Proceedings of the 7th ACM conference on Recommender systems
Year:
2013

Citing 14
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Restricted Boltzmann machines for collaborative filtering

Proceedings of the 24th international conference on Machine learning
Factorization meets the neighborhood: a multifaceted collaborative filtering model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Collaborative Filtering for Implicit Feedback Datasets

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
One-Class Collaborative Filtering

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Collaborative prediction and ranking with non-random missing data

Proceedings of the third ACM conference on Recommender systems
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks

The Journal of Machine Learning Research
Training and testing of recommender systems on data missing not at random

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Matrix Completion from Noisy Entries

The Journal of Machine Learning Research
Performance of recommender algorithms on top-n recommendation tasks

Proceedings of the fourth ACM conference on Recommender systems
Item popularity and recommendation accuracy

Proceedings of the fifth ACM conference on Recommender systems
Multi-value probabilistic matrix factorization for IP-TV recommendations

Proceedings of the fifth ACM conference on Recommender systems
Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics

Proceedings of the sixth ACM conference on Recommender systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The literature on recommender systems distinguishes typically between two broad categories of measuring recommendation accuracy: rating prediction, often quantified in terms of the root mean square error (RMSE), and ranking, measured in terms of metrics like precision and recall, among others. In this paper, we examine both approaches in detail, and find that the dominating difference lies instead in the training and test data considered: rating prediction is concerned with only the observed ratings, while ranking typically accounts for all items in the collection, whether the user has rated them or not. Furthermore, we show that predicting observed ratings, while popular in the literature, only solves a (small) part of the rating prediction task for any item in the collection, which is a common real-world problem. The reasons are selection bias in the data, combined with data sparsity. We show that the latter rating-prediction task involves the prediction task 'Who rated What' as a sub-problem, which can be cast as a classification or ranking problem. This suggests that solving the ranking problem is not only valuable by itself, but also for predicting the rating value of any item.