Towards developing general models of usability with PARADISE
Natural Language Engineering
PARADISE: a framework for evaluating spoken dialogue agents
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Speech Quality of VoIP: Assessment and Prediction
Speech Quality of VoIP: Assessment and Prediction
Predicting the quality and usability of spoken dialogue services
Speech Communication
Detecting Problematic Dialogs with Automated Agents
PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Analysis of a new simulation approach to dialog system evaluation
Speech Communication
User simulation as testing for spoken dialog systems
SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Modeling user satisfaction with Hidden Markov Model
SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Quality of Telephone-Based Spoken Dialogue Systems
Quality of Telephone-Based Spoken Dialogue Systems
Hi-index | 0.01 |
So far, predictions of user quality judgments in response to spoken dialog systems have been achieved on the basis of interaction parameters describing the dialog, e.g. in the PARADISE framework. These parameters do not take into account the temporal position of events happening in the dialog. It seems promising to apply sequence classification algorithms to the raw annotations of the data, instead of interaction parameters describing the overall dialog. As dialogs can be of very different length, Hidden Markov Models (HMM) and Markov Chains (MC) are handy, because they describe the likelihood of traversing to a state given only the previous state and the transition probability, thus they can be trained and applied to sequences of different lengths. This paper analyzes the feasibility of predicting user judgments with HMMs and MCs. In order to test the models, we acquire data with different types of users, forcing users to do as similar interactions as possible, and asking for user judgments after each turn. This allows comparing predicted distributions of judgments to the distributions measured empirically. We also apply the models to less rich corpora and compare them with results from Linear Regression models as used in the PARADISE framework.