Assessing user simulation for dialog systems using human judges and automatic evaluation measures

  • Authors:
  • Hua Ai;Diane Litman

  • Affiliations:
  • Intelligent systems program, university of pittsburgh, pittsburgh, pa 15260, usa e-mail: hua@cs.pitt.edu, litman@cs.pitt.edu, iamhuaai@gmail.com;Intelligent systems program, university of pittsburgh, pittsburgh, pa 15260, usa e-mail: hua@cs.pitt.edu, litman@cs.pitt.edu, iamhuaai@gmail.com

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

While different user simulations are built to assist dialog system development, there is an increasing need to quickly assess the quality of the user simulations reliably. Previous studies have proposed several automatic evaluation measures for this purpose. However, the validity of these evaluation measures has not been fully proven. We present an assessment study in which human judgments are collected on user simulation qualities as the gold standard to validate automatic evaluation measures. We show that a ranking model can be built using the automatic measures to predict the rankings of the simulations in the same order as the human judgments. We further show that the ranking model can be improved by using a simple feature that utilizes time-series analysis.