Information Retrieval Evaluation

Authors:
Donna Harman
Affiliations:
-
Venue:
Information Retrieval Evaluation
Year:
2011

Citing 0
Cited 11

IR research: systems, interaction, evaluation and theories

ACM SIGIR Forum
DESIRE 2011: workshop on data infrastructurEs for supporting information retrieval evaluation

ACM SIGIR Forum
DIRECTions: design and specification of an IR evaluation infrastructure

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Cumulated relative position: a metric for ranking evaluation

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
PROMISE retreat report prospects and opportunities for information access evaluation

ACM SIGIR Forum
User-Oriented evaluation in IR

PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
TREC-Style evaluations

PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Information Retrieval
Keyword search and evaluation over relational databases: an outlook to the future

Proceedings of the 7th International Workshop on Ranking in Databases
Evaluation as a service for information retrieval

ACM SIGIR Forum
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

Evaluation has always played a major role in information retrieval, with the early pioneers such as Cyril Cleverdon and Gerard Salton laying the foundations for most of the evaluation methodologies in use today. The retrieval community has been extremely fortunate to have such a well-grounded evaluation paradigm during a period when most of the human language technologies were just developing. This lecture has the goal of explaining where these evaluation methodologies came from and how they have continued to adapt to the vastly changed environment in the search engine world today. The lecture starts with a discussion of the early evaluation of information retrieval systems, starting with the Cranfield testing in the early 1960s, continuing with the Lancaster "user" study for MEDLARS, and presenting the various test collection investigations by the SMART project and by groups in Britain. The emphasis in this chapter is on the how and the why of the various methodologies developed. The second chapter covers the more recent "batch" evaluations, examining the methodologies used in the various open evaluation campaigns such as TREC, NTCIR (emphasis on Asian languages), CLEF (emphasis on European languages), INEX (emphasis on semi-structured data), etc. Here again the focus is on the how and why, and in particular on the evolving of the older evaluation methodologies to handle new information access techniques. This includes how the test collection techniques were modified and how the metrics were changed to better reflect operational environments. The final chapters look at evaluation issues in user studies -- the interactive part of information retrieval, including a look at the search log studies mainly done by the commercial search engines. Here the goal is to show, via case studies, how the high-level issues of experimental design affect the final evaluations. Table of Contents: Introduction and Early History / "Batch" Evaluation Since 1992 / Interactive Evaluation / Conclusion