Towards better evaluation for human language technology

Authors:
Donna Harman
Affiliations:
National Institute of Standards and Technology, Gaithersburg, Maryland
Venue:
LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Year:
2008

Citing 5
Cited 0

Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Why current IR engines fail

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Reliable information retrieval evaluation with incomplete and biased judgements

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Alternatives to Bpref

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Both research and evaluation in human language technology have enjoyed a big surge for the last fifteen years. Performance has made major advances, partially due to the availability of resources and the interest in the many evaluation forums present today. But there is much more to do, both in terms of new areas of research and in improved evaluation for these areas. This paper addresses the current state-ofthe-art in evaluation and then discusses some ideas for improving this evaluation.