Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Evaluating collaborative filtering recommender systems
ACM Transactions on Information Systems (TOIS)
Collaborative prediction and ranking with non-random missing data
Proceedings of the third ACM conference on Recommender systems
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks
The Journal of Machine Learning Research
Training and testing of recommender systems on data missing not at random
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Performance of recommender algorithms on top-n recommendation tasks
Proceedings of the fourth ACM conference on Recommender systems
Item popularity and recommendation accuracy
Proceedings of the fifth ACM conference on Recommender systems
Evaluation of recommendations: rating-prediction and ranking
Proceedings of the 7th ACM conference on Recommender systems
Evaluating top-n recommendations "when the best are gone"
Proceedings of the 7th ACM conference on Recommender systems
Hi-index | 0.00 |
The evaluation of recommender systems in terms of ranking has recently gained attention, as it seems to better fit the top-k recommendation task than the usual ratings prediction task. In that context, several authors have proposed to consider missing ratings as some form of negative feedback to compensate for the skewed distribution of observed ratings when users choose the items they rate. In this work, we study two major biases of the selection of items: the first one is that some items obtain more ratings than others (popularity effect), and the second one is that positive ratings are observed more frequently than negative ratings (positivity effect). We present a theoretical analysis and experiments on the Yahoo! dataset with randomly selected items, which show that considering missing data as a form of negative feedback during training may improve performances, but also that it can be misleading when testing, favoring models of popularity more than models of user preferences.