IR research: systems, interaction, evaluation and theories
ACM SIGIR Forum
DIRECTions: design and specification of an IR evaluation infrastructure
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Cumulated relative position: a metric for ranking evaluation
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
User-Oriented evaluation in IR
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Implementing crowdsourcing-based relevance experimentation: an industrial perspective
Information Retrieval
Keyword search and evaluation over relational databases: an outlook to the future
Proceedings of the 7th International Workshop on Ranking in Databases
Evaluation as a service for information retrieval
ACM SIGIR Forum
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.02 |
Evaluation has always played a major role in information retrieval, with the early pioneers such as Cyril Cleverdon and Gerard Salton laying the foundations for most of the evaluation methodologies in use today. The retrieval community has been extremely fortunate to have such a well-grounded evaluation paradigm during a period when most of the human language technologies were just developing. This lecture has the goal of explaining where these evaluation methodologies came from and how they have continued to adapt to the vastly changed environment in the search engine world today. The lecture starts with a discussion of the early evaluation of information retrieval systems, starting with the Cranfield testing in the early 1960s, continuing with the Lancaster "user" study for MEDLARS, and presenting the various test collection investigations by the SMART project and by groups in Britain. The emphasis in this chapter is on the how and the why of the various methodologies developed. The second chapter covers the more recent "batch" evaluations, examining the methodologies used in the various open evaluation campaigns such as TREC, NTCIR (emphasis on Asian languages), CLEF (emphasis on European languages), INEX (emphasis on semi-structured data), etc. Here again the focus is on the how and why, and in particular on the evolving of the older evaluation methodologies to handle new information access techniques. This includes how the test collection techniques were modified and how the metrics were changed to better reflect operational environments. The final chapters look at evaluation issues in user studies -- the interactive part of information retrieval, including a look at the search log studies mainly done by the commercial search engines. Here the goal is to show, via case studies, how the high-level issues of experimental design affect the final evaluations. Table of Contents: Introduction and Early History / "Batch" Evaluation Since 1992 / Interactive Evaluation / Conclusion