Empirical methods for evaluating dialog systems

Authors:
Tim Paek
Affiliations:
Microsoft Research, Redmond, WA
Venue:
SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Year:
2001

Citing 5
Cited 2

Empirically evaluating an adaptable spoken dialogue system

UM '99 Proceedings of the seventh international conference on User modeling
A computational architecture for conversation

UM '99 Proceedings of the seventh international conference on User modeling
Designing Interactive Speech Systems: From First Ideas to User Testing

Designing Interactive Speech Systems: From First Ideas to User Testing
Conversation as Action Under Uncertainty

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
PARADISE: a framework for evaluating spoken dialogue agents

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

Towards context-adaptive utterance interpretation

SIGDIAL '02 Proceedings of the 3rd SIGdial workshop on Discourse and dialogue - Volume 2
Evaluating multimodal systems: a comparison of established questionnaires and interaction parameters

Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine what purpose a dialog metric serves and then propose empirical methods for evaluating systems that meet that purpose. The methods include a protocol for conducting a wizard-of-oz experiment and a basic set of descriptive statistics for substantiating performance claims using the data collected from the experiment as an ideal benchmark or "gold standard" for comparative judgments. The methods also provide a practical means of optimizing the system through component analysis and cost valuation.