Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Learning agents for uncertain environments (extended abstract)
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Least-squares policy iteration
The Journal of Machine Learning Research
PARADISE: a framework for evaluating spoken dialogue agents
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Learning optimal dialogue strategies: a case study of a spoken dialogue agent for email
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Partially observable Markov decision processes for spoken dialog systems
Computer Speech and Language
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Controlling listening-oriented dialogue using partially observable Markov decision processes
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Sample-efficient batch reinforcement learning for dialogue management optimization
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
This paper investigates the impact of reward shaping on a reinforcement learning-based spoken dialogue system's learning. A diffuse reward function gives a reward after each transition between two dialogue states. A sparse function only gives a reward at the end of the dialogue. Reward shaping consists of learning a diffuse function without modifying the optimal policy compared to a sparse one. Two reward shaping methods are applied to a corpus of dialogues evaluated with numerical performance scores. Learning with these functions is compared to the sparse case and it is shown, on simulated dialogues, that the policies learnt after reward shaping lead to higher performance.