Questionnaires for eliciting evaluation data from users of interactive question answering systems

Authors:
D. Kelly;P. b. Kantor;E. l. Morse;J. Scholtz;Y. Sun
Affiliations:
University of north carolina, chapel hill, nc 27599-3360, usa e-mail: dianek@email.unc.edu;Rutgers university, new brunswick, nj 08901, usa e-mail: kantor@scils.rutgers.edu;National institute of standards & technology, gaithersburg, md 20899, usa e-mail: emile.morse@nist.gov;Pacific northwest national laboratory, richland, wa 99352, usa e-mail: jean.scholtz@pnl.gov;University at buffalo, the state university of new york, buffalo, ny 14260, usa e-mail: sun3@buffalo.edu
Venue:
Natural Language Engineering
Year:
2009

Citing 14
Cited 0

Development of an instrument measuring user satisfaction of the human-computer interface

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The pragmatics of information retrieval experimentation, revisited

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
The concept of relevance in IR

Journal of the American Society for Information Science and Technology
PARADISE: a framework for evaluating spoken dialogue agents

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Context-based question-answering evaluation

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Glass Box: An Instrumented Infrastructure for Supporting Human Interaction with Information

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences - Volume 09
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Cross-Evaluation: A new model for information system evaluation

Journal of the American Society for Information Science and Technology
Glass box: capturing, archiving, and retrieving workstation activities

Proceedings of the 3rd ACM workshop on Continuous archival and retrival of personal experences
Advances in Open Domain Question Answering (Text, Speech and Language Technology)

Advances in Open Domain Question Answering (Text, Speech and Language Technology)
Experiments with interactive question-answering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Using interview data to identify evaluation criteria for interactive, analytical question-answering systems

Journal of the American Society for Information Science and Technology
A model for quantitative evaluation of an end-to-end question-answering system

Journal of the American Society for Information Science and Technology
User-centered evaluation of interactive question answering systems

IQA '06 Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluating interactive question answering (QA) systems with real users can be challenging because traditional evaluation measures based on the relevance of items returned are difficult to employ since relevance judgments can be unstable in multi-user evaluations. The work reported in this paper evaluates, in distinguishing among a set of interactive QA systems, the effectiveness of three questionnaires: a Cognitive Workload Questionnaire (NASA TLX), and Task and System Questionnaires customized to a specific interactive QA application. These Questionnaires were evaluated with four systems, seven analysts, and eight scenarios during a 2-week workshop. Overall, results demonstrate that all three Questionnaires are effective at distinguishing among systems, with the Task Questionnaire being the most sensitive. Results also provide initial support for the validity and reliability of the Questionnaires.