Natural gradient works efficiently in learning
Neural Computation
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Planning and Acting under Uncertainty: A New Model for Spoken Dialogue System
UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
Spoken dialogue management using probabilistic reasoning
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Reinforcement learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
Completely Derandomized Self-Adaptation in Evolution Strategies
Evolutionary Computation
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Efficient model learning for dialog management
Proceedings of the ACM/IEEE international conference on Human-robot interaction
Python for Scientific Computing
Computing in Science and Engineering
Neurocomputing
Natural Language Engineering
Using automatically transcribed dialogs to learn user models in a spoken dialog system
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Demonstration of a POMDP voice dialer
HLT-Demonstrations '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management
Computer Speech and Language
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems
Computer Speech and Language
Solving deep memory POMDPs with recurrent policy gradients
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
ECML'05 Proceedings of the 16th European conference on Machine Learning
Reinforcement learning for parameter estimation in statistical spoken dialogue systems
Computer Speech and Language
Exploiting machine-transcribed dialog corpus to improve multiple dialog states tracking methods
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Gaussian Processes for POMDP-Based Dialogue Manager Optimization
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
This article presents a novel algorithm for learning parameters in statistical dialogue systems which are modeled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy that selects the system's responses based on the inferred state; and a reward function that specifies the desired behavior of the system. Ideally both the model parameters and the policy would be designed to maximize the cumulative reward. However, while there are many techniques available for learning the optimal policy, no good ways of learning the optimal model parameters that scale to real-world dialogue systems have been found yet. The presented algorithm, called the Natural Actor and Belief Critic (NABC), is a policy gradient method that offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected cumulative reward. The resulting gradient is then used to adapt both the prior distribution of the dialogue model parameters and the policy parameters. In addition, the article presents a variant of the NABC algorithm, called the Natural Belief Critic (NBC), which assumes that the policy is fixed and only the model parameters need to be estimated. The algorithms are evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximize the expected cumulative reward result in significantly improved performance compared to the baseline hand-crafted model parameters. The algorithms are also compared to optimization techniques using plain gradients and state-of-the-art random search algorithms. In all cases, the algorithms based on the natural gradient work significantly better.