Reward shaping for statistical optimisation of dialogue management

Authors:
Layla El Asri;Romain Laroche;Olivier Pietquin
Affiliations:
Orange Labs, Issy-les-Moulineaux, France,IMS-MaLIS Research Group, UMI 2958 (CNRS - GeorgiaTech), SUPELEC Metz Campus, Metz, France;Orange Labs, Issy-les-Moulineaux, France;IMS-MaLIS Research Group, UMI 2958 (CNRS - GeorgiaTech), SUPELEC Metz Campus, Metz, France
Venue:
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Year:
2013

Citing 13
Cited 0

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Learning agents for uncertain environments (extended abstract)

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
PARADISE: a framework for evaluating spoken dialogue agents

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Learning optimal dialogue strategies: a case study of a spoken dialogue agent for email

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Automating spoken dialogue management design using machine learning: An industry perspective

Speech Communication
An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Controlling listening-oriented dialogue using partially observable Markov decision processes

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Sample-efficient batch reinforcement learning for dialogue management optimization

ACM Transactions on Speech and Language Processing (TSLP)
Learning and evaluation of dialogue strategies for new applications: Empirical methods for optimization from small data sets

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the impact of reward shaping on a reinforcement learning-based spoken dialogue system's learning. A diffuse reward function gives a reward after each transition between two dialogue states. A sparse function only gives a reward at the end of the dialogue. Reward shaping consists of learning a diffuse function without modifying the optimal policy compared to a sparse one. Two reward shaping methods are applied to a corpus of dialogues evaluated with numerical performance scores. Learning with these functions is compared to the sparse case and it is shown, on simulated dialogues, that the policies learnt after reward shaping lead to higher performance.