Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
The Convergence of TD(λ) for General λ
Machine Learning
Temporal difference learning and TD-Gammon
Communications of the ACM
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
On the worst-case analysis of temporal-difference learning algorithms
Machine Learning - Special issue on reinforcement learning
Mean-field theory for batched TD (&lgr;)
Neural Computation
On the existence of fixed points for approximate value iteration and temporal-difference learning
Journal of Optimization Theory and Applications
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal Difference Learning
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Relative Loss Bounds for Temporal-Difference Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning and value function approximation in complex decision processes
Learning and value function approximation in complex decision processes
Least-squares policy iteration
The Journal of Machine Learning Research
A reinforcement learning approach to job-shop scheduling
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Performance Loss Bounds for Approximate Value Iteration with State Aggregation
Mathematics of Operations Research
Proceedings of the 24th international conference on Machine learning
A New Learning Algorithm for Optimal Stopping
Discrete Event Dynamic Systems
Projected equation methods for approximate solution of large linear systems
Journal of Computational and Applied Mathematics
On Regression-Based Stopping Times
Discrete Event Dynamic Systems
Error Bounds for Approximations from Projected Linear Equations
Mathematics of Operations Research
Journal of Artificial Intelligence Research
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
Mathematics of Operations Research
Recursive least-squares learning with eligibility traces
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Hi-index | 0.00 |
The traditional Kalman filter can be viewed as a recursive stochastic algorithm that approximates an unknown function via a linear combination of prespecified basis functions given a sequence of noisy samples. In this paper, we generalize the algorithm to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction. Instead of noisy samples of the desired fixed point, the algorithm updates parameters based on noisy samples of functions generated by application of the operator, in the spirit of Robbins---Monro stochastic approximation. The algorithm is motivated by temporal-difference learning, and our developments lead to a possibly more efficient variant of temporal-difference learning. We establish convergence of the algorithm and explore efficiency gains through computational experiments involving optimal stopping and queueing problems.