International Journal of Human-Computer Studies - Special issue on collaboration, cooperation and conflict in dialogue systems
The Architecture of Why2-Atlas: A Coach for Qualitative Physics Essay Writing
ITS '02 Proceedings of the 6th International Conference on Intelligent Tutoring Systems
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
SPoT: a trainable sentence planner
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Evaluation metrics for generation
INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Developing a flexible spoken dialog system using simulation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Evaluating user simulations with the Cramér-von Mises divergence
Speech Communication
Spoken Versus Typed Human and Computer Dialogue Tutoring
International Journal of Artificial Intelligence in Education
Data-driven user simulation for automated evaluation of spoken dialog systems
Computer Speech and Language
Simulating the behaviour of older versus younger users when interacting with spoken dialogue systems
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Analysis of a new simulation approach to dialog system evaluation
Speech Communication
ITSPOKE: an intelligent tutoring spoken dialogue system
HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Comparing user simulation models for dialog strategy learning
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
User simulation as testing for spoken dialog systems
SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
An evaluation understudy for dialogue coherence models
SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Using linguistic cues for the automatic recognition of personality in conversation and text
Journal of Artificial Intelligence Research
Applications of discourse structure for spoken dialogue systems
Applications of discourse structure for spoken dialogue systems
Adaptive help for speech dialogue systems based on learning and forgetting of speech commands
SigDIAL '06 Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue
Setting up user action probabilities in user simulations for dialog system development
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Natural language generation as planning under uncertainty for spoken dialogue systems
Empirical methods in natural language generation
Learning factors analysis – a general method for cognitive model evaluation and improvement
ITS'06 Proceedings of the 8th international conference on Intelligent Tutoring Systems
The Hidden Agenda User Simulation Model
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
While different user simulations are built to assist dialog system development, there is an increasing need to quickly assess the quality of the user simulations reliably. Previous studies have proposed several automatic evaluation measures for this purpose. However, the validity of these evaluation measures has not been fully proven. We present an assessment study in which human judgments are collected on user simulation qualities as the gold standard to validate automatic evaluation measures. We show that a ranking model can be built using the automatic measures to predict the rankings of the simulations in the same order as the human judgments. We further show that the ranking model can be improved by using a simple feature that utilizes time-series analysis.