Spoken Dialog Challenge 2010: comparison of live and control test results

Authors:
Alan W. Black;Susanne Burger;Alistair Conkie;Helen Hastie;Simon Keizer;Oliver Lemon;Nicolas Merigaud;Gabriel Parent;Gabriel Schubiner;Blaise Thomson;Jason D. Williams;Kai Yu;Steve Young;Maxine Eskenazi
Affiliations:
Carnegie Mellon University, Pittsburgh;Carnegie Mellon University, Pittsburgh;AT&T Labs -- Research, Florham Park, NJ;Heriot-Watt University, Edinburgh, UK;Cambridge University, Cambridge, UK;Heriot-Watt University, Edinburgh, UK;Heriot-Watt University, Edinburgh, UK;Carnegie Mellon University, Pittsburgh;Carnegie Mellon University, Pittsburgh;Cambridge University, Cambridge, UK;AT&T Labs -- Research, Florham Park, NJ;Cambridge University, Cambridge, UK;Cambridge University, Cambridge, UK;Carnegie Mellon University, Pittsburgh
Venue:
SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Year:
2011

Citing 0
Cited 5

Optimizing the turn-taking behavior of task-oriented spoken dialog systems

ACM Transactions on Speech and Language Processing (TSLP)
Adaptive information presentation for spoken dialogue systems: evaluation with human subjects

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Towards standardized metrics and tools for spoken and multimodal dialog system evaluation: position paper

SDCTD '12 NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data
After dialog went pervasive: separating dialog behavior modeling and task modeling

SDCTD '12 NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data
A belief tracking challenge task for spoken dialog systems

SDCTD '12 NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Spoken Dialog Challenge 2010 was an exercise to investigate how different spoken dialog systems perform on the same task. The existing Let's Go Pittsburgh Bus Information System was used as a task and four teams provided systems that were first tested in controlled conditions with speech researchers as users. The three most stable systems were then deployed to real callers. This paper presents the results of the live tests, and compares them with the control test results. Results show considerable variation both between systems and between the control and live tests. Interestingly, relatively high task completion for controlled tests did not always predict relatively high task completion for live tests. Moreover, even though the systems were quite different in their designs, we saw very similar correlations between word error rate and task completion for all the systems. The dialog data collected is available to the research community.