A Geometric Approach to Multi-Criterion Reinforcement Learning

Authors:
Shie Mannor;Nahum Shimkin
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2004

Citing 19
Cited 9

TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
The sciences of the artificial (3rd ed.)

The sciences of the artificial (3rd ed.)
Competitive Markov decision processes

Competitive Markov decision processes
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning in Neural Networks: Theoretical Foundations

Learning in Neural Networks: Theoretical Foundations
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Finite State Markovian Decision Processes

Finite State Markovian Decision Processes
Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms

SIAM Journal on Control and Optimization
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Multi-criteria Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The empirical Bayes envelope and regret minimization in competitive Markov decision processes

Mathematics of Operations Research
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms

Neural Computation
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Dynamic preferences in multi-criteria reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Improving SVM classifiers training using artificial samples

ICCOMP'07 Proceedings of the 11th WSEAS International Conference on Computers
Learning all optimal policies with multiple criteria

Proceedings of the 25th international conference on Machine learning
On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Online learning with constraints

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Linear fitted-Q iteration with multiple reward functions

The Journal of Machine Learning Research
Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework

Engineering Applications of Artificial Intelligence
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of reinforcement learning in a controlled Markov environment with multiple objective functions of the long-term average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, actions that are observed but cannot be predicted beforehand. We capture this situation using a stochastic game model, where the learning agent is facing an adversary whose policy is arbitrary and unknown, and where the reward function is vector-valued. State recurrence conditions are imposed throughout. In our basic problem formulation, a desired target set is specified in the vector reward space, and the objective of the learning agent is to approach the target set, in the sense that the long-term average reward vector will belong to this set. We devise appropriate learning algorithms, that essentially use multiple reinforcement learning algorithms for the standard scalar reward problem, which are combined using the geometric insight from the theory of approachability for vector-valued stochastic games. We then address the more general and optimization-related problem, where a nested class of possible target sets is prescribed, and the goal of the learning agent is to approach the smallest possible target set (which will generally depend on the unknown system parameters). A particular case which falls into this framework is that of stochastic games with average reward constraints, and further specialization provides a reinforcement learning algorithm for constrained Markov decision processes. Some basic examples are provided to illustrate these results.