User simulations for context-sensitive speech recognition in spoken dialogue systems

  • Authors:
  • Oliver Lemon;loannis Konstas

  • Affiliations:
  • Edinburgh University;University of Glasgow

  • Venue:
  • EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We use a machine learner trained on a combination of acoustic and contextual features to predict the accuracy of incoming n-best automatic speech recognition (ASR) hypotheses to a spoken dialogue system (SDS). Our novel approach is to use a simple statistical User Simulation (US) for this task, which measures the likelihood that the user would say each hypothesis in the current context. Such US models are now common in machine learning approaches to SDS, are trained on real dialogue data, and are related to theories of "alignment" in psycholinguistics. We use a US to predict the user's next dialogue move and thereby re-rank n-best hypotheses of a speech recognizer for a corpus of 2564 user utterances. The method achieved a significant relative reduction of Word Error Rate (WER) of 5% (this is 44% of the possible WER improvement on this data), and 62% of the possible semantic improvement (Dialogue Move Accuracy), compared to the baseline policy of selecting the topmost ASR hypothesis. The majority of the improvement is attributable to the User Simulation feature, as shown by Information Gain analysis.