Training a real-world POMDP-based dialogue system

  • Authors:
  • Blaise Thomson;Jost Schatzmann;Karl Weilhammer;Hui Ye;Steve Young

  • Affiliations:
  • Cambridge University, Cambridge, United Kingdom;Cambridge University, Cambridge, United Kingdom;Cambridge University, Cambridge, United Kingdom;Cambridge University, Cambridge, United Kingdom;Cambridge University, Cambridge, United Kingdom

  • Venue:
  • NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Partially Observable Markov Decision Processes provide a principled way to model uncertainty in dialogues. However, traditional algorithms for optimising policies are intractable except for cases with very few states. This paper discusses a new approach to policy optimisation based on grid-based Q-learning with a summary of belief space. We also present a technique for bootstrapping the system using a novel agenda-based user model. An implementation of a policy trained using this system was tested with human subjects in an extensive trial. The policy gave highly competitive results, with a 90.6% task completion rate.