Fuzzy Policy Reinforcement Learning in Cooperative Multi-robot Systems
Journal of Intelligent and Robotic Systems
Biologically-inspired adaptive learning control strategies: A rough set approach
International Journal of Hybrid Intelligent Systems
Toward Approximate Adaptive Learning
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
INFLUENCE OF TEMPERATURE ON SWARMBOTS THAT LEARN
Cybernetics and Systems
Learning the Filling Policy of a Biodegradation Process by Fuzzy Actor---Critic Learning Methodology
MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Exploration and exploitation balance management in fuzzy reinforcement learning
Fuzzy Sets and Systems
Approximate dynamic programming with a fuzzy parameterization
Automatica (Journal of IFAC)
CIRA'09 Proceedings of the 8th IEEE international conference on Computational intelligence in robotics and automation
Continuous-state reinforcement learning with fuzzy approximation
ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Similarity of learned helplessness in human being and fuzzy reinforcement learning algorithms
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Computational intelligence models for image processing and information reasoning
Hi-index | 0.00 |
This paper provides the first convergence proof for fuzzy reinforcement learning (FRL) as well as experimental results supporting our analysis. We extend the work of Konda and Tsitsiklis, who presented a convergent actor-critic (AC) algorithm for a general parameterized actor. In our work we prove that a fuzzy rulebase actor satisfies the necessary conditions that guarantee the convergence of its parameters to a local optimum. Our fuzzy rulebase uses Takagi-Sugeno-Kang rules, Gaussian membership functions, and product inference. As an application domain, we chose a difficult task of power control in wireless transmitters, characterized by delayed rewards and a high degree of stochasticity. To the best of our knowledge, no reinforcement learning algorithms have been previously applied to this task. Our simulation results show that the ACFRL algorithm consistently converges in this domain to a locally optimal policy.