TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
The sciences of the artificial (3rd ed.)
The sciences of the artificial (3rd ed.)
Competitive Markov decision processes
Competitive Markov decision processes
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning in Neural Networks: Theoretical Foundations
Learning in Neural Networks: Theoretical Foundations
Neuro-Dynamic Programming
Finite State Markovian Decision Processes
Finite State Markovian Decision Processes
Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms
SIAM Journal on Control and Optimization
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Multi-criteria Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The empirical Bayes envelope and regret minimization in competitive Markov decision processes
Mathematics of Operations Research
SIAM Journal on Control and Optimization
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms
Neural Computation
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Dynamic preferences in multi-criteria reinforcement learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
Improving SVM classifiers training using artificial samples
ICCOMP'07 Proceedings of the 11th WSEAS International Conference on Computers
Learning all optimal policies with multiple criteria
Proceedings of the 25th international conference on Machine learning
On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Online learning with constraints
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Linear fitted-Q iteration with multiple reward functions
The Journal of Machine Learning Research
Engineering Applications of Artificial Intelligence
A survey of multi-objective sequential decision-making
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We consider the problem of reinforcement learning in a controlled Markov environment with multiple objective functions of the long-term average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, actions that are observed but cannot be predicted beforehand. We capture this situation using a stochastic game model, where the learning agent is facing an adversary whose policy is arbitrary and unknown, and where the reward function is vector-valued. State recurrence conditions are imposed throughout. In our basic problem formulation, a desired target set is specified in the vector reward space, and the objective of the learning agent is to approach the target set, in the sense that the long-term average reward vector will belong to this set. We devise appropriate learning algorithms, that essentially use multiple reinforcement learning algorithms for the standard scalar reward problem, which are combined using the geometric insight from the theory of approachability for vector-valued stochastic games. We then address the more general and optimization-related problem, where a nested class of possible target sets is prescribed, and the goal of the learning agent is to approach the smallest possible target set (which will generally depend on the unknown system parameters). A particular case which falls into this framework is that of stochastic games with average reward constraints, and further specialization provides a reinforcement learning algorithm for constrained Markov decision processes. Some basic examples are provided to illustrate these results.