Development of an instrument measuring user satisfaction of the human-computer interface
CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The pragmatics of information retrieval experimentation, revisited
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
The concept of relevance in IR
Journal of the American Society for Information Science and Technology
PARADISE: a framework for evaluating spoken dialogue agents
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Context-based question-answering evaluation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Glass Box: An Instrumented Infrastructure for Supporting Human Interaction with Information
HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences - Volume 09
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Cross-Evaluation: A new model for information system evaluation
Journal of the American Society for Information Science and Technology
Glass box: capturing, archiving, and retrieving workstation activities
Proceedings of the 3rd ACM workshop on Continuous archival and retrival of personal experences
Advances in Open Domain Question Answering (Text, Speech and Language Technology)
Advances in Open Domain Question Answering (Text, Speech and Language Technology)
Experiments with interactive question-answering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Journal of the American Society for Information Science and Technology
A model for quantitative evaluation of an end-to-end question-answering system
Journal of the American Society for Information Science and Technology
User-centered evaluation of interactive question answering systems
IQA '06 Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006
Hi-index | 0.00 |
Evaluating interactive question answering (QA) systems with real users can be challenging because traditional evaluation measures based on the relevance of items returned are difficult to employ since relevance judgments can be unstable in multi-user evaluations. The work reported in this paper evaluates, in distinguishing among a set of interactive QA systems, the effectiveness of three questionnaires: a Cognitive Workload Questionnaire (NASA TLX), and Task and System Questionnaires customized to a specific interactive QA application. These Questionnaires were evaluated with four systems, seven analysts, and eight scenarios during a 2-week workshop. Overall, results demonstrate that all three Questionnaires are effective at distinguishing among systems, with the Task Questionnaire being the most sensitive. Results also provide initial support for the validity and reliability of the Questionnaires.