Off-Policy Temporal Difference Learning with Function Approximation

Authors:
Doina Precup;Richard S. Sutton;Sanjoy Dasgupta
Affiliations:
-;-;-
Venue:
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Year:
2001

Citing 0
Cited 19

To Collect or Not to Collect? Machine Learning for Memory Management

Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium
Least-squares policy iteration

The Journal of Machine Learning Research
Standard and averaging reinforcement learning in XCS

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Reinforcement learning in the presence of rare events

Proceedings of the 25th international conference on Machine learning
An analysis of reinforcement learning with function approximation

Proceedings of the 25th international conference on Machine learning
Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation

RoboCup 2007: Robot Soccer World Cup XI
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Adaptive importance sampling with automatic model selection in value function approximation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
Least absolute policy iteration for robust value function approximation

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Q-learning with linear function approximation

COLT'07 Proceedings of the 20th annual conference on Learning theory
An off-policy natural policy gradient method for a partial observable Markov decision process

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Reinforcement learning with partially known world dynamics

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Policy improvement for POMDPs using normalized importance sampling

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract