Universal approximation using radial-basis-function networks
Neural Computation
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Reinforcement Learning
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Least-squares policy iteration
The Journal of Machine Learning Research
Information state and dialogue management in the TRINDI dialogue move engine toolkit
Natural Language Engineering
PARADISE: a framework for evaluating spoken dialogue agents
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Tree-Based Batch Mode Reinforcement Learning
The Journal of Machine Learning Research
Partially observable Markov decision processes for spoken dialog systems
Computer Speech and Language
Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets
Computational Linguistics
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Some studies in machine learning using the game of checkers
IBM Journal of Research and Development
Sparse approximate dynamic programming for dialog management
SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Gaussian processes for fast policy optimisation of POMDP-based dialogue managers
SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Machine learning for spoken dialogue management: an experiment with speech-based database querying
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
The kernel recursive least-squares algorithm
IEEE Transactions on Signal Processing
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning
IEEE Transactions on Neural Networks
Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
An adaptive dialogue system with online dialogue policy learning
SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Statistical user simulation for spoken dialogue systems: what for, which data, which future?
SDCTD '12 NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data
Towards adaptive dialogue systems for assistive living environments
Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Inverse reinforcement learning for interactive systems
Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Reward shaping for statistical optimisation of dialogue management
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. Handcrafting the dialogue policy is not always an option, considering the complexity of the dialogue task and the stochastic behavior of users. In recent years approaches based on Reinforcement Learning (RL) for policy optimization in dialogue management have been proved to be an efficient approach for dialogue policy optimization. Yet most of the conventional RL algorithms are data intensive and demand techniques such as user simulation. Doing so, additional modeling errors are likely to occur. This paper explores the possibility of using a set of approximate dynamic programming algorithms for policy optimization in SDS. Moreover, these algorithms are combined to a method for learning a sparse representation of the value function. Experimental results show that these algorithms when applied to dialogue management optimization are particularly sample efficient, since they learn from few hundreds of dialogue examples. These algorithms learn in an off-policy manner, meaning that they can learn optimal policies with dialogue examples generated with a quite simple strategy. Thus they can learn good dialogue policies directly from data, avoiding user modeling errors.