Neurofuzzy adaptive modelling and control
Neurofuzzy adaptive modelling and control
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Variable Resolution Discretization in Optimal Control
Machine Learning
A reinforcement learning adaptive fuzzy controller for robots
Fuzzy Sets and Systems - Theme: Modeling and control
Least-squares policy iteration
The Journal of Machine Learning Research
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Tree-Based Batch Mode Reinforcement Learning
The Journal of Machine Learning Research
Finite-Time Bounds for Fitted Value Iteration
The Journal of Machine Learning Research
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems
ACC'09 Proceedings of the 2009 conference on American Control Conference
Continuous-state reinforcement learning with fuzzy approximation
ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Fuzzy inference system learning by reinforcement methods
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
IEEE Transactions on Fuzzy Systems
Hi-index | 22.14 |
Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem.