Aggregation in dynamic programming
Operations Research
The complexity of dynamic programming
Journal of Complexity
Numerical methods for stochastic control problems in continuous time
Numerical methods for stochastic control problems in continuous time
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
An Upper Bound on the Loss from Approximate Optimal-Value Functions
Machine Learning
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
On the existence of fixed points for approximate value iteration and temporal-difference learning
Journal of Optimization Theory and Applications
On Average Versus Discounted Reward Temporal-Difference Learning
Machine Learning
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation
Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences
Machine Learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal Difference Learning
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Stable Function Approximation in Dynamic Programming
Stable Function Approximation in Dynamic Programming
Temporal credit assignment in reinforcement learning
Temporal credit assignment in reinforcement learning
Learning and value function approximation in complex decision processes
Learning and value function approximation in complex decision processes
Approximate solutions to markov decision processes
Approximate solutions to markov decision processes
The Linear Programming Approach to Approximate Dynamic Programming
Operations Research
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
Discrete Event Dynamic Systems
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Brief paper: Average cost temporal-difference learning
Automatica (Journal of IFAC)
Hi-index | 0.00 |
We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.