Kalman temporal differences

Authors:
Matthieu Geist;Olivier Pietquin
Affiliations:
Supélec, Metz, France;Supélec, Metz, France
Venue:
Journal of Artificial Intelligence Research
Year:
2010

Citing 26
Cited 6

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Least-squares policy iteration

The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches

Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning
On the role of tracking in stationary environments

Proceedings of the 24th international conference on Machine learning
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Bayesian Reward Filtering

Recent Advances in Reinforcement Learning
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Adaptive Filtering Prediction and Control

Adaptive Filtering Prediction and Control
Tracking in Reinforcement Learning

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part I
Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Markov Decision Processes in Artificial Intelligence

Markov Decision Processes in Artificial Intelligence
Revisiting natural actor-critics with value function approximation

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning
Consistent normalized least mean square filtering with noisy data matrix

IEEE Transactions on Signal Processing

Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Recursive least-squares learning with eligibility traces

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Value function approximation through sparse bayesian modeling

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
A comparative study of reinforcement learning techniques on dialogue management

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Towards adaptive dialogue systems for assistive living environments

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Social signal and user adaptation in reinforcement learning-based dialogue management

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.