Approximate dynamic programming with a fuzzy parameterization

Authors:
Lucian Buş/oniu;Damien Ernst;Bart De Schutter;Robert Babuš/ka
Affiliations:
Delft Center for Systems & Control, Delft University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands;FNRS/ Institut Montefiore, Univ. Liè/ge, Sart-Tilman, Bldg. B28, Parking P32, B-4000 Liè/ge, Belgium;Delft Center for Systems & Control, Delft University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands and Marine & Transport Technology, Delft University of Technology, The Netherlands;Delft Center for Systems & Control, Delft University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands
Venue:
Automatica (Journal of IFAC)
Year:
2010

Citing 12
Cited 0

Neurofuzzy adaptive modelling and control

Neurofuzzy adaptive modelling and control
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Variable Resolution Discretization in Optimal Control

Machine Learning
A reinforcement learning adaptive fuzzy controller for robots

Fuzzy Sets and Systems - Theme: Modeling and control
Least-squares policy iteration

The Journal of Machine Learning Research
Interpolation-based Q-learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems

ACC'09 Proceedings of the 2009 conference on American Control Conference
Continuous-state reinforcement learning with fuzzy approximation

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Fuzzy inference system learning by reinforcement methods

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	22.14

Visualization

Abstract

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem.