Gaussian Processes for POMDP-Based Dialogue Manager Optimization

Authors:
Milica Gasic;Steve Young
Affiliations:
Dept. of Eng., Univ. of Cambridge, Cambridge, UK;Dept. of Eng., Univ. of Cambridge, Cambridge, UK
Venue:
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Year:
2014

Citing 21
Cited 0

Information-based objective functions for active data selection

Neural Computation
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Probability Product Kernels

The Journal of Machine Learning Research
Spoken dialogue management using probabilistic reasoning

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
A Unifying View of Sparse Approximate Gaussian Process Regression

The Journal of Machine Learning Research
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Natural Actor-Critic

Neurocomputing
Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets

Computational Linguistics
Gaussian process dynamic programming

Neurocomputing
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

Computer Speech and Language
Training and evaluation of the HIS POMDP dialogue system in noise

SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems

Computer Speech and Language
Parameter estimation for agenda-based user simulation

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Gaussian processes for fast policy optimisation of POMDP-based dialogue managers

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

ACM Transactions on Speech and Language Processing (TSLP)
Scaling POMDPs for Spoken Dialog Management

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems.