Qualitative reinforcement learning

Authors:
Arkady Epshteyn;Gerald DeJong
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 8
Cited 4

Bounded-parameter Markov decision process

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Exploration and apprenticeship learning in reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Qualitative reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A possibilistic model for qualitative sequential decision problems under uncertainty in partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Qualitative MDPs and POMDPs: an order-of-magnitude approximation

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Qualitative reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Multigrid Reinforcement Learning with Reward Shaping

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Neural Networks
Improving optimistic exploration in model-free reinforcement learning

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.