Empirical methods for evaluating dialog systems

  • Authors:
  • Tim Paek

  • Affiliations:
  • Microsoft Research, Redmond, WA

  • Venue:
  • ELDS '01 Proceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 9
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We examine what purpose a dialog metric serves and then propose empirical methods for evaluating systems that meet that purpose. The methods include a protocol for conducting a wizard-of-oz experiment and a basic set of descriptive statistics for substantiating performance claims using the data collected from the experiment as an ideal benchmark or "gold standard" for making comparative judgments. The methods also provide a practical means of optimizing the system through component analysis and cost valuation.