Natural gradient works efficiently in learning
Neural Computation
Spoken dialogue management using probabilistic reasoning
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Understanding spontaneous speech: the Phoenix system
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Partially observable Markov decision processes for spoken dialog systems
Computer Speech and Language
Efficient model learning for dialog management
Proceedings of the ACM/IEEE international conference on Human-robot interaction
Spoken language interaction with model uncertainty: an adaptive human-robot interaction system
Connection Science - Language and Robots
Using automatically transcribed dialogs to learn user models in a spoken dialog system
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management
Computer Speech and Language
ISDS '97 Interactive Spoken Dialog Systems on Bringing Speech and NLP Together in Real Applications
Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems
Computer Speech and Language
ACM Transactions on Speech and Language Processing (TSLP)
Scaling POMDPs for Spoken Dialog Management
IEEE Transactions on Audio, Speech, and Language Processing
Reinforcement learning of question-answering dialogue policies for virtual museum guides
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Hi-index | 0.00 |
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters.