Evaluating user simulations with the Cramér-von Mises divergence

  • Authors:
  • Jason D. Williams

  • Affiliations:
  • AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ 07932, USA

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

User simulations are increasingly employed in the development and evaluation of spoken dialog systems. However, there is no accepted method for evaluating user simulations, which is problematic because the performance of new dialog management techniques is often evaluated on user simulations alone, not on real people. In this paper, we propose a novel method of evaluating user simulations. We view a user simulation as a predictor of the performance of a dialog system, where per-dialog performance is measured with a domain-specific scoring function. The divergence between the distribution of dialog scores in the real and simulated corpora provides a measure of the quality of the user simulation, and we argue that the Cramer-von Mises divergence is well-suited to this task. To demonstrate this technique, we study a corpus of callers with real information needs and show that Cramer-von Mises divergence conforms to expectations. Finally, we present simple tools which enable practitioners to interpret the statistical significance of comparisons between user simulations.