Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Challenges in adopting speech recognition
Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Evaluation over thousands of queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical power in retrieval experimentation
Proceedings of the 17th ACM conference on Information and knowledge management
Improvements that don't add up: ad-hoc retrieval results since 1998
Proceedings of the 18th ACM conference on Information and knowledge management
Deciding on an adjustment for multiplicity in IR experiments
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
The standard "Cranfield" approach to the evaluation of information retrieval systems has been used and refined for nearly fifty years, and has been a key element in the development of large-scale retrieval systems. The resources created by such systematic evaluations have enabled thorough retrospective investigation of the strengths and limitations of particular variants of this evaluation approach; over the last few years, such investigation has for example led to identification of serious flaws in some experiments. Knowledge of these flaws can prevent their perpetuation into future work and informs the design of new experiments and infrastructures. In this position statement we briefly review some aspects of evaluation and, based on our research and observations over the last decade, outline some principles on which we believe new infrastructure should rest.