Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences

Authors:
Olivier Pietquin;Matthieu Geist;Senthilkumar Chandramohan
Affiliations:
SUPELEC, IMS Research Group and UMI, GeorgiaTech, CNRS, Metz, France;SUPELEC, IMS Research Group, Metz, France;SUPELEC, IMS Research Group, Metz, France
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 12
Cited 1

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Information state and dialogue management in the TRINDI dialogue move engine toolkit

Natural Language Engineering
PARADISE: a framework for evaluating spoken dialogue agents

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies

The Knowledge Engineering Review
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets

Computational Linguistics
An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Gaussian processes for fast policy optimisation of POMDP-based dialogue managers

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Kalman temporal differences

Journal of Artificial Intelligence Research
Sample-efficient batch reinforcement learning for dialogue management optimization

ACM Transactions on Speech and Language Processing (TSLP)
A probabilistic framework for dialog simulation and optimal strategy learning

IEEE Transactions on Audio, Speech, and Language Processing

Statistical user simulation for spoken dialogue systems: what for, which data, which future?

SDCTD '12 NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Designing dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or less random changes in the policy on users to assess them as improvements or degradations. This is called on policy learning. Nevertheless, it can result in system behaviors that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy.