Integrating a partial model into model free reinforcement learning

Authors:
Aviv Tamar;Dotan Di Castro;Ron Meir
Affiliations:
Department of Electrical Engineering, Technion, Haifa, Israel;Department of Electrical Engineering, Technion, Haifa, Israel;Department of Electrical Engineering, Technion, Haifa, Israel
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 14
Cited 0

Matrix analysis

Matrix analysis
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
TD(λ) Converges with Probability 1

Machine Learning
Temporal difference learning and TD-Gammon

Communications of the ACM
Reinforcement learning for call admission control and routing in integrated service networks

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Analytical Mean Squared Error Curves for Temporal DifferenceLearning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning Algorithms for Markov Decision Processes with Average Cost

SIAM Journal on Control and Optimization
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Actor-critic algorithms

Actor-critic algorithms
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Using inaccurate models in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning to act using real-time dynamic programming

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel procedure which augments a model free algorithm with a partial model. The resulting hybrid algorithm switches between a model based and a model free mode, depending on the current state and the agent's knowledge. Our method relies on a novel definition for a partially known model, and an estimator that incorporates such knowledge in order to reduce uncertainty in stochastic approximation iterations. We prove that such an approach leads to improved policy evaluation whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach on policy gradient and Q-learning algorithms, and its usefulness in solving a call admission control problem.