Predicting the quality and usability of spoken dialogue services

Authors:
Sebastian Möller;Klaus-Peter Engelbrecht;Robert Schleicher
Affiliations:
Quality and Usability Laboratory, Deutsche Telekom Laboratories, Berlin University of Technology, Ernst-Reuter-Platz 7, D-10587 Berlin, Germany;Quality and Usability Laboratory, Deutsche Telekom Laboratories, Berlin University of Technology, Ernst-Reuter-Platz 7, D-10587 Berlin, Germany;Quality and Usability Laboratory, Deutsche Telekom Laboratories, Berlin University of Technology, Ernst-Reuter-Platz 7, D-10587 Berlin, Germany
Venue:
Speech Communication
Year:
2008

Citing 5
Cited 13

Bayesian interpolation

Neural Computation
Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI)

Natural Language Engineering
Towards developing general models of usability with PARADISE

Natural Language Engineering
PARADISE: a framework for evaluating spoken dialogue agents

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Quality of Telephone-Based Spoken Dialogue Systems

Quality of Telephone-Based Spoken Dialogue Systems

Subjective Evaluation Method for Speech-Based Uni- and Multimodal Applications

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Comparing objective and subjective measures of usability in a human-robot dialogue system

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Modeling user satisfaction with Hidden Markov Model

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Sequential classifiers for the prediction of user judgments about spoken dialog systems

Speech Communication
Evaluating multimodal systems: a comparison of established questionnaires and interaction parameters

Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries
Issues in predicting user satisfaction transitions in dialogues: individual differences, evaluation criteria, and prediction models

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
A user model to predict user satisfaction with spoken dialog systems

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Modeling user satisfaction transitions in dialogues from overall ratings

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Classifying dialogue in high-dimensional space

ACM Transactions on Speech and Language Processing (TSLP)
Which system differences matter?: using l1/l2 regularization to compare dialogue systems

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Towards standardized metrics and tools for spoken and multimodal dialog system evaluation: position paper

SDCTD '12 NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data
One year of contender: what have we learned about assessing and tuning industrial spoken dialog systems?

SDCTD '12 NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we compare different approaches for predicting the quality and usability of spoken dialogue systems. The respective models provide estimations of user judgments on perceived quality, based on parameters which can be extracted from interaction logs. Different types of input parameters and different modeling algorithms have been compared using three spoken dialogue databases obtained with two different systems. The results show that both linear regression models and classification trees are able to cover around 50% of the variance in the training data, and neural networks even more. When applied to independent test data, in particular to data obtained with different systems and/or user groups, the prediction accuracy decreases significantly. The underlying reasons for the limited predictive power are discussed. It is shown that - although an accurate prediction of individual ratings is not yet possible with such models - they may still be used for taking decisions on component optimization, and are thus helpful tools for the system developer.