Assessing user simulation for dialog systems using human judges and automatic evaluation measures

Authors:
Hua Ai;Diane Litman
Affiliations:
Intelligent systems program, university of pittsburgh, pittsburgh, pa 15260, usa e-mail: hua@cs.pitt.edu, litman@cs.pitt.edu, iamhuaai@gmail.com;Intelligent systems program, university of pittsburgh, pittsburgh, pa 15260, usa e-mail: hua@cs.pitt.edu, litman@cs.pitt.edu, iamhuaai@gmail.com
Venue:
Natural Language Engineering
Year:
2011

Citing 25
Cited 0

The agreement process: an empirical investigation of human—human computer-mediated collaborative dialogs

International Journal of Human-Computer Studies - Special issue on collaboration, cooperation and conflict in dialogue systems
The Architecture of Why2-Atlas: A Coach for Qualitative Physics Essay Writing

ITS '02 Proceedings of the 6th International Conference on Intelligent Tutoring Systems
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
SPoT: a trainable sentence planner

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Evaluation metrics for generation

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Developing a flexible spoken dialog system using simulation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Evaluating user simulations with the Cramér-von Mises divergence

Speech Communication
Spoken Versus Typed Human and Computer Dialogue Tutoring

International Journal of Artificial Intelligence in Education
Data-driven user simulation for automated evaluation of spoken dialog systems

Computer Speech and Language
Simulating the behaviour of older versus younger users when interacting with spoken dialogue systems

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Analysis of a new simulation approach to dialog system evaluation

Speech Communication
ITSPOKE: an intelligent tutoring spoken dialogue system

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Comparing user simulation models for dialog strategy learning

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
User simulation as testing for spoken dialog systems

SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
An evaluation understudy for dialogue coherence models

SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Using linguistic cues for the automatic recognition of personality in conversation and text

Journal of Artificial Intelligence Research
Applications of discourse structure for spoken dialogue systems

Applications of discourse structure for spoken dialogue systems
Adaptive help for speech dialogue systems based on learning and forgetting of speech commands

SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Setting up user action probabilities in user simulations for dialog system development

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automated metrics that agree with human judgements on generated output for an embodied conversational agent

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
A two-tier user simulation model for reinforcement learning of adaptive referring expression generation policies

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Natural language generation as planning under uncertainty for spoken dialogue systems

Empirical methods in natural language generation
Learning factors analysis – a general method for cognitive model evaluation and improvement

ITS'06 Proceedings of the 8th international conference on Intelligent Tutoring Systems
The Hidden Agenda User Simulation Model

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

While different user simulations are built to assist dialog system development, there is an increasing need to quickly assess the quality of the user simulations reliably. Previous studies have proposed several automatic evaluation measures for this purpose. However, the validity of these evaluation measures has not been fully proven. We present an assessment study in which human judgments are collected on user simulation qualities as the gold standard to validate automatic evaluation measures. We show that a ranking model can be built using the automatic measures to predict the rankings of the simulations in the same order as the human judgments. We further show that the ranking model can be improved by using a simple feature that utilizes time-series analysis.