Empirical methods for evaluating dialog systems

  • Authors:
  • Tim Paek

  • Affiliations:
  • Microsoft Research, Redmond, WA

  • Venue:
  • SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We examine what purpose a dialog metric serves and then propose empirical methods for evaluating systems that meet that purpose. The methods include a protocol for conducting a wizard-of-oz experiment and a basic set of descriptive statistics for substantiating performance claims using the data collected from the experiment as an ideal benchmark or "gold standard" for comparative judgments. The methods also provide a practical means of optimizing the system through component analysis and cost valuation.