Reinforcement learning for MDPs with constraints

Authors:
Peter Geibel
Affiliations:
Institute of Cognitive Science, AI Group, University of Osnabrück, Germany
Venue:
ECML'06 Proceedings of the 17th European conference on Machine Learning
Year:
2006

Citing 7
Cited 4

Constrained Markov decision models with weighted discounted rewards

Mathematics of Operations Research
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multi-criteria Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Constructing optimal policies for agents with constrained architectures

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Risk-sensitive reinforcement learning applied to control under constraints

Journal of Artificial Intelligence Research
Approximating optimal policies for agents with limited execution resources

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Coordination guided reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
An empirical comparison of two common multiobjective reinforcement learning algorithms

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Undesired state-action prediction in multi-agent reinforcement learning for linked multi-component robotic system control

Information Sciences: an International Journal
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, I will consider Markov Decision Processes with two criteria, each defined as the expected value of an infinite horizon cumulative return. The second criterion is either itself subject to an inequality constraint, or there is maximum allowable probability that the single returns violate the constraint. I describe and discuss three new reinforcement learning approaches for solving such control problems.