Brief paper: Average cost temporal-difference learning

Authors:
John N. Tsitsiklis;Benjamin Van Roy
Affiliations:
Laboratory for Information and Decision Systems, Room 35-209, 77 Massachusetts Avenue, Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA;Laboratory for Information and Decision Systems, Room 35-209, 77 Massachusetts Avenue, Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA
Venue:
Automatica (Journal of IFAC)
Year:
1999

Citing 6
Cited 20

Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
The Convergence of TD(λ) for General λ

Machine Learning
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Stochastic approximation for non-expansive maps: applications to q-learning algorithms

Stochastic approximation for non-expansive maps: applications to q-learning algorithms

On Average Versus Discounted Reward Temporal-Difference Learning

Machine Learning
Opponent interactions between serotonin and dopamine

Neural Networks - Computational models of neuromodulation
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Discrete Event Dynamic Systems
Long-term reward prediction in TD models of the dopamine system

Neural Computation
Optimizing Average Reward Using Discounted Rewards

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Performance Loss Bounds for Approximate Value Iteration with State Aggregation

Mathematics of Operations Research
Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes

Simulation
New Error Bounds for Approximations from Projected Linear Equations

Recent Advances in Reinforcement Learning
Projected equation methods for approximate solution of large linear systems

Journal of Computational and Applied Mathematics
A reinforcement learning framework for utility-based scheduling in resource-constrained systems

Future Generation Computer Systems
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
Hyperbolically discounted temporal difference learning

Neural Computation
Error Bounds for Approximations from Projected Linear Equations

Mathematics of Operations Research
Adaptive data-aware utility-based scheduling in resource-constrained systems

Journal of Parallel and Distributed Computing
A simulation-based approximate dynamic programming approach for the control of the Intel Mini-Fab benchmark model

Winter Simulation Conference
Adaptive utility-based scheduling in resource-constrained systems

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
A time aggregation approach to Markov decision processes

Automatica (Journal of IFAC)
Optimization model selection for simulation-based approximate dynamic programming approaches in semiconductor manufacturing operations

Proceedings of the Winter Simulation Conference
An Actor-Critic based controller for glucose regulation in type 1 diabetes

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	22.15

Visualization

Abstract

We propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present a proof of convergence (with probability 1) and a characterization of the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the ''mixing time'' of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.