A Survey of Accuracy Evaluation Metrics of Recommendation Tasks

Authors:
Asela Gunawardana;Guy Shani
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2009

Citing 30
Cited 34

Recommender systems

Communications of the ACM
Recommender systems in e-commerce

Proceedings of the 1st ACM conference on Electronic commerce
Analysis of recommendation algorithms for e-commerce

Proceedings of the 2nd ACM conference on Electronic commerce
Implicit interest indicators

Proceedings of the 6th international conference on Intelligent user interfaces
Information Retrieval

Information Retrieval
Methods and metrics for cold-start recommendations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
A Taxonomy of Recommender Agents on theInternet

Artificial Intelligence Review
Amazon.com Recommendations: Item-to-Item Collaborative Filtering

IEEE Internet Computing
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Recommendation Systems: A Probabilistic Analysis

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Evaluating collaborative filtering recommender systems

ACM Transactions on Information Systems (TOIS)
A collaborative filtering algorithm and evaluation metric that accurately model the user experience

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mining Skewed and Sparse Transaction Data for Personalized Shopping Recommendation

Machine Learning
No Unbiased Estimator of the Variance of K-Fold Cross-Validation

The Journal of Machine Learning Research
Improving recommendation lists through topic diversification

WWW '05 Proceedings of the 14th international conference on World Wide Web
An MDP-Based Recommender System

The Journal of Machine Learning Research
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Trust building with explanation interfaces

Proceedings of the 11th international conference on Intelligent user interfaces
Making recommendations better: an analytic model for human-recommender interaction

CHI '06 Extended Abstracts on Human Factors in Computing Systems
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Evaluation of recommender systems: A new approach

Expert Systems with Applications: An International Journal
A new approach to evaluating novel recommendations

Proceedings of the 2008 ACM conference on Recommender systems
Controlled experiments on the web: survey and practical guide

Data Mining and Knowledge Discovery
A comparative user study on rating vs. personality quiz based preference elicitation methods

Proceedings of the 14th international conference on Intelligent user interfaces
Lessons on applying automated recommender systems to information-seeking tasks

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Optimal recommendation sets: covering uncertainty over user preferences

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
CFW: a collaborative filtering system using posteriors over weights of evidence

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Empirical analysis of predictive algorithms for collaborative filtering

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
A probabilistic interpretation of precision, recall and F-score, with implication for evaluation

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

TrustRank: inducing trust in automatic translations via ranking

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Tutorial on evaluating recommender systems

Proceedings of the fourth ACM conference on Recommender systems
List-wise learning to rank with matrix factorization for collaborative filtering

Proceedings of the fourth ACM conference on Recommender systems
Mining mood-specific movie similarity with matrix factorization for context-aware recommendation

Proceedings of the Workshop on Context-Aware Movie Recommendation
User-based Collaborative Filtering: Sparsity and Performance

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
RecLab: a system for eCommerce recommender research with real data, context and feedback

Proceedings of the 2011 Workshop on Context-awareness in Retrieval and Recommendation
A recommendation system for spots in location-based online social networks

Proceedings of the 4th Workshop on Social Network Systems
A case study in a recommender system based on purchase data

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation and recommendation methods based on graph model

BI'11 Proceedings of the 2011 international conference on Brain informatics
Group recommendation using feature space representing behavioral tendency and power balance among members

Proceedings of the fifth ACM conference on Recommender systems
A probabilistic definition of item similarity

Proceedings of the fifth ACM conference on Recommender systems
Factorization Machines with libFM

ACM Transactions on Intelligent Systems and Technology (TIST)
Collaborative Filtering Recommender Systems

Foundations and Trends in Human-Computer Interaction
Using past-prediction accuracy in recommender systems

Information Sciences: an International Journal
Tag-aware recommender systems: a state-of-the-art survey

Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics

Proceedings of the sixth ACM conference on Recommender systems
When recommenders fail: predicting recommender failure for algorithm selection and combination

Proceedings of the sixth ACM conference on Recommender systems
RESYGEN: A Recommendation System Generator using domain-based heuristics

Expert Systems with Applications: An International Journal
Combining quality prediction and system selection for improved automatic translation output

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Learning to rank for hybrid recommendation

Proceedings of the 21st ACM international conference on Information and knowledge management
Top-N recommendation through belief propagation

Proceedings of the 21st ACM international conference on Information and knowledge management
Mining contextual movie similarity with matrix factorization for context-aware recommendation

ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
Being confident about the quality of the predictions in recommender systems

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Estimating confidence of individual rating predictions in collaborative filtering recommender systems

Expert Systems with Applications: An International Journal
Recommender systems survey

Knowledge-Based Systems
PREA: personalized recommendation algorithms toolkit

The Journal of Machine Learning Research
GAPfm: optimal top-n recommendations for graded relevance domains

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Evaluation of recommendations: rating-prediction and ranking

Proceedings of the 7th ACM conference on Recommender systems
Trading-off among accuracy, similarity, diversity, and long-tail: a graph-based recommendation approach

Proceedings of the 7th ACM conference on Recommender systems
A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation
Research paper recommender system evaluation: a quantitative literature survey

Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation
Recommending people to people: the nature of reciprocal recommenders with a case study in online dating

User Modeling and User-Adapted Interaction
Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols

User Modeling and User-Adapted Interaction
Tutorial on application-oriented evaluation of recommendation systems

AI Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recommender systems are now popular both commercially and in the research community, where many algorithms have been suggested for providing recommendations. These algorithms typically perform differently in various domains and tasks. Therefore, it is important from the research perspective, as well as from a practical view, to be able to decide on an algorithm that matches the domain and the task of interest. The standard way to make such decisions is by comparing a number of algorithms offline using some evaluation metric. Indeed, many evaluation metrics have been suggested for comparing recommendation algorithms. The decision on the proper evaluation metric is often critical, as each metric may favor a different algorithm. In this paper we review the proper construction of offline experiments for deciding on the most appropriate algorithm. We discuss three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task. We demonstrate how using an improper evaluation metric can lead to the selection of an improper algorithm for the task of interest. We also discuss other important considerations when designing offline experiments.