Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
The Convergence of TD(λ) for General λ
Machine Learning
Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Stochastic approximation for non-expansive maps: applications to q-learning algorithms
Stochastic approximation for non-expansive maps: applications to q-learning algorithms
On Average Versus Discounted Reward Temporal-Difference Learning
Machine Learning
Opponent interactions between serotonin and dopamine
Neural Networks - Computational models of neuromodulation
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
Discrete Event Dynamic Systems
Long-term reward prediction in TD models of the dopamine system
Neural Computation
Optimizing Average Reward Using Discounted Rewards
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Performance Loss Bounds for Approximate Value Iteration with State Aggregation
Mathematics of Operations Research
New Error Bounds for Approximations from Projected Linear Equations
Recent Advances in Reinforcement Learning
Projected equation methods for approximate solution of large linear systems
Journal of Computational and Applied Mathematics
A reinforcement learning framework for utility-based scheduling in resource-constrained systems
Future Generation Computer Systems
Natural actor-critic algorithms
Automatica (Journal of IFAC)
Hyperbolically discounted temporal difference learning
Neural Computation
Error Bounds for Approximations from Projected Linear Equations
Mathematics of Operations Research
Adaptive data-aware utility-based scheduling in resource-constrained systems
Journal of Parallel and Distributed Computing
Winter Simulation Conference
Adaptive utility-based scheduling in resource-constrained systems
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
A time aggregation approach to Markov decision processes
Automatica (Journal of IFAC)
Proceedings of the Winter Simulation Conference
An Actor-Critic based controller for glucose regulation in type 1 diabetes
Computer Methods and Programs in Biomedicine
Hi-index | 22.15 |
We propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present a proof of convergence (with probability 1) and a characterization of the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the ''mixing time'' of the Markov chain. The results parallel previous work by the authors, involving approximations of discounted cost-to-go.