Evaluating user simulations with the Cramér-von Mises divergence

Authors:
Jason D. Williams
Affiliations:
AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ 07932, USA
Venue:
Speech Communication
Year:
2008

Citing 11
Cited 4

Towards developing general models of usability with PARADISE

Natural Language Engineering
Learning optimal dialogue strategies: a case study of a spoken dialogue agent for email

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Spoken dialogue management using probabilistic reasoning

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Learning more effective dialogue strategies using limited dialogue move features

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Fast reinforcement learning of dialog strategies

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Applying POMDPs to dialog systems in the troubleshooting domain

NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
Training a real-world POMDP-based dialogue system

NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
Optimizing dialogue management with reinforcement learning: experiments with the NJFun system

Journal of Artificial Intelligence Research
Scaling POMDPs for Spoken Dialog Management

IEEE Transactions on Audio, Speech, and Language Processing

A two-tier user simulation model for reinforcement learning of adaptive referring expression generation policies

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Parameter estimation for agenda-based user simulation

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Assessing user simulation for dialog systems using human judges and automatic evaluation measures

Natural Language Engineering
Generative goal-driven user simulation for dialog management

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

User simulations are increasingly employed in the development and evaluation of spoken dialog systems. However, there is no accepted method for evaluating user simulations, which is problematic because the performance of new dialog management techniques is often evaluated on user simulations alone, not on real people. In this paper, we propose a novel method of evaluating user simulations. We view a user simulation as a predictor of the performance of a dialog system, where per-dialog performance is measured with a domain-specific scoring function. The divergence between the distribution of dialog scores in the real and simulated corpora provides a measure of the quality of the user simulation, and we argue that the Cramer-von Mises divergence is well-suited to this task. To demonstrate this technique, we study a corpus of callers with real information needs and show that Cramer-von Mises divergence conforms to expectations. Finally, we present simple tools which enable practitioners to interpret the statistical significance of comparisons between user simulations.