Optimizing the turn-taking behavior of task-oriented spoken dialog systems

  • Authors:
  • Antoine Raux;Maxine Eskenazi

  • Affiliations:
  • Honda Research Institute USA, Mountain View, CA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • ACM Transactions on Speech and Language Processing (TSLP)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Even as progress in speech technologies and task and dialog modeling has allowed the development of advanced spoken dialog systems, the low-level interaction behavior of those systems often remains rigid and inefficient. Based on an analysis of human-human and human-computer turn-taking in naturally occurring task-oriented dialogs, we define a set of features that can be automatically extracted and show that they can be used to inform efficient end-of-turn detection. We then frame turn-taking as decision making under uncertainty and describe the Finite-State Turn-Taking Machine (FSTTM), a decision-theoretic model that combines data-driven machine learning methods and a cost structure derived from Conversation Analysis to control the turn-taking behavior of dialog systems. Evaluation results on CMU Let's Go, a publicly deployed bus information system, confirm that the FSTTM significantly improves the responsiveness of the system compared to a standard threshold-based approach, as well as previous data-driven methods.