Sample-efficient batch reinforcement learning for dialogue management optimization

Authors:
Olivier Pietquin;Matthieu Geist;Senthilkumar Chandramohan;Hervé Frezza-Buet
Affiliations:
Supélec and UMI 2958 (GeorgiaTech - CNRS), Metz, France;Supélec, Metz, France;Supélec, Metz, France;Supélec and UMI 2958 (GeorgiaTech - CNRS), Metz, France
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2011

Citing 18
Cited 6

Universal approximation using radial-basis-function networks

Neural Computation
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Reinforcement Learning

Reinforcement Learning
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Least-squares policy iteration

The Journal of Machine Learning Research
Information state and dialogue management in the TRINDI dialogue move engine toolkit

Natural Language Engineering
PARADISE: a framework for evaluating spoken dialogue agents

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets

Computational Linguistics
An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development
Sparse approximate dynamic programming for dialog management

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Gaussian processes for fast policy optimisation of POMDP-based dialogue managers

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Machine learning for spoken dialogue management: an experiment with speech-based database querying

AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
The kernel recursive least-squares algorithm

IEEE Transactions on Signal Processing
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

IEEE Transactions on Neural Networks

Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
An adaptive dialogue system with online dialogue policy learning

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Statistical user simulation for spoken dialogue systems: what for, which data, which future?

SDCTD '12 NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data
Towards adaptive dialogue systems for assistive living environments

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Inverse reinforcement learning for interactive systems

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Reward shaping for statistical optimisation of dialogue management

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. Handcrafting the dialogue policy is not always an option, considering the complexity of the dialogue task and the stochastic behavior of users. In recent years approaches based on Reinforcement Learning (RL) for policy optimization in dialogue management have been proved to be an efficient approach for dialogue policy optimization. Yet most of the conventional RL algorithms are data intensive and demand techniques such as user simulation. Doing so, additional modeling errors are likely to occur. This paper explores the possibility of using a set of approximate dynamic programming algorithms for policy optimization in SDS. Moreover, these algorithms are combined to a method for learning a sparse representation of the value function. Experimental results show that these algorithms when applied to dialogue management optimization are particularly sample efficient, since they learn from few hundreds of dialogue examples. These algorithms learn in an off-policy manner, meaning that they can learn optimal policies with dialogue examples generated with a quite simple strategy. Thus they can learn good dialogue policies directly from data, avoiding user modeling errors.