Reinforcement learning versus model predictive control: a comparison on a power system problem

Authors:
Damien Ernst;Mevludin Glavic;Florin Capitanescu;Louis Wehenkel
Affiliations:
Belgian National Fund for Scientific Research, Brussels, Belgium and Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium;Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium;Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium;Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
2009

Citing 24
Cited 2

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Robust constrained model predictive control using linear matrix inequalities

Automatica (Journal of IFAC)
Advances in kernel methods: support vector learning

Advances in kernel methods: support vector learning
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Adaptive Markov Control Processes

Adaptive Markov Control Processes
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Fundamentals of Artificial Neural Networks

Fundamentals of Artificial Neural Networks
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Random Forests

Machine Learning
Kernel-Based Reinforcement Learning

Machine Learning
Reinforcement Learning Applied to Linear Quadratic Regulation

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Dynamic Programming

Dynamic Programming
Nonlinear Model Predictive Control via Feasibility-Perturbed Sequential Quadratic Programming

Computational Optimization and Applications
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Extremely randomized trees

Machine Learning
Stability of Multistage Stochastic Programs

SIAM Journal on Optimization
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning
Reinforcement learning with raw image pixels as input state

IWICPAS'06 Proceedings of the 2006 Advances in Machine Vision, Image Processing, and Pattern Analysis international conference on Intelligent Computing in Pattern Analysis/Synthesis
Survey Constrained model predictive control: Stability and optimality

Automatica (Journal of IFAC)
Brief A probabilistically constrained model predictive controller

Automatica (Journal of IFAC)

Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees

Recent Advances in Reinforcement Learning
Online optimization for the smart (micro) grid

Proceedings of the 3rd International Conference on Future Energy Systems: Where Energy, Computing and Communication Meet

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discrete-time optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes open-loop policies by applying an interior-point solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a model-free way closed-loop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batch-mode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available.