Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics

Authors:
Bruno Pradel;Nicolas Usunier;Patrick Gallinari
Affiliations:
LIP6 - Université Paris 6, Paris, France;LIP6 - Université Paris 6, Paris, France;LIP6 - Université Paris 6, Paris, France
Venue:
Proceedings of the sixth ACM conference on Recommender systems
Year:
2012

Citing 8
Cited 2

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Evaluating collaborative filtering recommender systems

ACM Transactions on Information Systems (TOIS)
Matrix Factorization Techniques for Recommender Systems

Computer
Collaborative prediction and ranking with non-random missing data

Proceedings of the third ACM conference on Recommender systems
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks

The Journal of Machine Learning Research
Training and testing of recommender systems on data missing not at random

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Performance of recommender algorithms on top-n recommendation tasks

Proceedings of the fourth ACM conference on Recommender systems
Item popularity and recommendation accuracy

Proceedings of the fifth ACM conference on Recommender systems

Evaluation of recommendations: rating-prediction and ranking

Proceedings of the 7th ACM conference on Recommender systems
Evaluating top-n recommendations "when the best are gone"

Proceedings of the 7th ACM conference on Recommender systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The evaluation of recommender systems in terms of ranking has recently gained attention, as it seems to better fit the top-k recommendation task than the usual ratings prediction task. In that context, several authors have proposed to consider missing ratings as some form of negative feedback to compensate for the skewed distribution of observed ratings when users choose the items they rate. In this work, we study two major biases of the selection of items: the first one is that some items obtain more ratings than others (popularity effect), and the second one is that positive ratings are observed more frequently than negative ratings (positivity effect). We present a theoretical analysis and experiments on the Yahoo! dataset with randomly selected items, which show that considering missing data as a form of negative feedback during training may improve performances, but also that it can be misleading when testing, favoring models of popularity more than models of user preferences.