Planning and acting in partially observable stochastic domains
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
The Knowledge Engineering Review
Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Finding approximate POMDP solutions through belief compression
Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Evaluating user simulations with the Cramér-von Mises divergence
Speech Communication
Using automatically transcribed dialogs to learn user models in a spoken dialog system
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
The hidden information state dialogue manager: a real-world POMDP-based system
NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Training and evaluation of the HIS POMDP dialogue system in noise
SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
The Knowledge Engineering Review
Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems
Computer Speech and Language
Journal of Ambient Intelligence and Smart Environments - A software engineering perspective on smart applications for AmI
A domain-independent statistical methodology for dialog management in spoken dialog systems
Computer Speech and Language
Hi-index | 0.00 |
Partially Observable Markov Decision Processes provide a principled way to model uncertainty in dialogues. However, traditional algorithms for optimising policies are intractable except for cases with very few states. This paper discusses a new approach to policy optimisation based on grid-based Q-learning with a summary of belief space. We also present a technique for bootstrapping the system using a novel agenda-based user model. An implementation of a policy trained using this system was tested with human subjects in an extensive trial. The policy gave highly competitive results, with a 90.6% task completion rate.