Bounded-parameter Markov decision process
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Exploration and apprenticeship learning in reinforcement learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
Qualitative reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Qualitative MDPs and POMDPs: an order-of-magnitude approximation
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Qualitative reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Multigrid Reinforcement Learning with Reward Shaping
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Improving optimistic exploration in model-free reinforcement learning
ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Hi-index | 0.00 |
When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.