Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
SIAM Journal on Control and Optimization
Least-squares policy iteration
The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
Discrete Event Dynamic Systems
PAC model-free reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches
Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches
Proceedings of the 24th international conference on Machine learning
On the role of tracking in stationary environments
Proceedings of the 24th international conference on Machine learning
Recent Advances in Reinforcement Learning
Regularization and feature selection in least-squares temporal difference learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Incremental least-squares temporal difference learning
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Adaptive Filtering Prediction and Control
Adaptive Filtering Prediction and Control
Tracking in Reinforcement Learning
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part I
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Markov Decision Processes in Artificial Intelligence
Markov Decision Processes in Artificial Intelligence
Revisiting natural actor-critics with value function approximation
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
ECML'05 Proceedings of the 16th European conference on Machine Learning
Consistent normalized least mean square filtering with noisy data matrix
IEEE Transactions on Signal Processing
Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Recursive least-squares learning with eligibility traces
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Value function approximation through sparse bayesian modeling
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
A comparative study of reinforcement learning techniques on dialogue management
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Towards adaptive dialogue systems for assistive living environments
Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Social signal and user adaptation in reinforcement learning-based dialogue management
Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Hi-index | 0.00 |
Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.