Finding information on the World Wide Web: the retrieval effectiveness of search engines
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Tipster/MUC-5: information extraction system evaluation
MUC5 '93 Proceedings of the 5th conference on Message understanding
Design of the MUC-6 evaluation
MUC6 '95 Proceedings of the 6th conference on Message understanding
Overview of results of the MUC-6 evaluation
MUC6 '95 Proceedings of the 6th conference on Message understanding
Benchmark tests for the DARPA Spoken Language Program
HLT '93 Proceedings of the workshop on Human Language Technology
1993 benchmark tests for the ARPA spoken language program
HLT '94 Proceedings of the workshop on Human Language Technology
The text retrieval conferences (TRECS)
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Evaluating a content based image retrieval system
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the evaluation: a case study using the TREC 2002 question answering track
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Generating and evaluating evaluative arguments
Artificial Intelligence
A model for quantitative evaluation of an end-to-end question-answering system
Journal of the American Society for Information Science and Technology
Assessing term effectiveness in the interactive information access process
Information Processing and Management: an International Journal
On test collections for adaptive information retrieval
Information Processing and Management: an International Journal
Generating and evaluating evaluative arguments
Artificial Intelligence
Automatic classification of medical reports, the CIREA project
TELE-INFO'06 Proceedings of the 5th WSEAS international conference on Telecommunications and informatics
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part IV
Hi-index | 0.00 |
System evaluation has mattered since research on automatic language and information processing began. However, the (D)ARPA conferences have raised the stakes substantially in requiring and delivering systematic evaluations and in sustaining these through long term programmes; and it has been claimed that this has both significantly raised task performance, as defined by appropriate effectiveness measures, and promoted relevant engineering development. These controlled laboratory evaluations have made very strong assumptions about the task context. The paper examines these assumptions for six task areas, considers their impact on evaluation and performance results, and argues that for current tasks of interest, e.g. summarising, it is now essential to play down the present narrowly-defined performance measures in order to address the task context, and specifically the role of the human participant in the task, so that new measures, of larger value, can be developed and applied.