Empirically evaluating an adaptable spoken dialogue system
UM '99 Proceedings of the seventh international conference on User modeling
A computational architecture for conversation
UM '99 Proceedings of the seventh international conference on User modeling
Designing Interactive Speech Systems: From First Ideas to User Testing
Designing Interactive Speech Systems: From First Ideas to User Testing
Conversation as Action Under Uncertainty
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
PARADISE: a framework for evaluating spoken dialogue agents
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Towards human-like spoken dialogue systems
Speech Communication
Relations between de-facto criteria in the evaluation of a spoken dialogue system
Speech Communication
Bootstrapping spoken dialogue systems by exploiting reusable libraries
Natural Language Engineering
Integrating Planning and Dialogue in a Lifestyle Agent
IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
Human judgment as a parameter in evaluation campaigns
HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Which system differences matter?: using l1/l2 regularization to compare dialogue systems
SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Hi-index | 0.00 |
We examine what purpose a dialog metric serves and then propose empirical methods for evaluating systems that meet that purpose. The methods include a protocol for conducting a wizard-of-oz experiment and a basic set of descriptive statistics for substantiating performance claims using the data collected from the experiment as an ideal benchmark or "gold standard" for making comparative judgments. The methods also provide a practical means of optimizing the system through component analysis and cost valuation.