A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

Authors:
Joeran Beel;Marcel Genzmehr;Stefan Langer;Andreas Nürnberger;Bela Gipp
Affiliations:
Docear, Magdeburg, Germany;Docear, Magdeburg, Germany;Docear, Magdeburg, Germany;Otto-von-Guericke University Magdeburg, Germany;University of California, Berkeley
Venue:
Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation
Year:
2013

Citing 17
Cited 1

Private acts and public objects: An investigation of citer motives

Journal of the American Society for Information Science
CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Do batch and user evaluations give the same results?

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Why batch and user evaluations do not give the same results

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of Item-Based Top-N Recommendation Algorithms

Proceedings of the tenth international conference on Information and knowledge management
On the recommending of citations for research papers

CSCW '02 Proceedings of the 2002 ACM conference on Computer supported cooperative work
Evaluating collaborative filtering recommender systems

ACM Transactions on Information Systems (TOIS)
Enhancing digital libraries with TechLens+

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Don't look stupid: avoiding pitfalls when recommending research papers

CSCW '06 Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work
I Like It... I Like It Not: Evaluating User Ratings Noise in Recommender Systems

UMAP '09 Proceedings of the 17th International Conference on User Modeling, Adaptation, and Personalization: formerly UM and AH
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks

The Journal of Machine Learning Research
Beyond accuracy: evaluating recommender systems by coverage and serendipity

Proceedings of the fourth ACM conference on Recommender systems
Docear: an academic literature suite for searching, organizing and creating academic literature

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Recommender systems: from algorithms to user experience

User Modeling and User-Adapted Interaction
Explaining the user experience of recommender systems

User Modeling and User-Adapted Interaction
Introducing Docear's research paper recommender system

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Research paper recommender system evaluation: a quantitative literature survey

Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation

Research paper recommender system evaluation: a quantitative literature survey

Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.