Flocks, herds and schools: A distributed behavioral model
SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
Gradient descent for general reinforcement learning
Proceedings of the 1998 conference on Advances in neural information processing systems II
Multiagent learning using a variable learning rate
Artificial Intelligence
A Multi-Agent Policy-Gradient Approach to Network Routing
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Reinforcement Learning in POMDPs with Function Approximation
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning to Cooperate via Policy Search
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Nash Convergence of Gradient Dynamics in General-Sum Games
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
SIAM Journal on Control and Optimization
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
IEEE Transactions on Fuzzy Systems
Value-function reinforcement learning in Markov games
Cognitive Systems Research
An Adaptable Oscillator-Based Controller for Autonomous Robots
Journal of Intelligent and Robotic Systems
Dynamic scheduling of maintenance tasks in the petroleum industry: A reinforcement approach
Engineering Applications of Artificial Intelligence
Journal of Intelligent Manufacturing
Hi-index | 0.00 |
A multi-agent reinforcement learning algorithm with fuzzy policy is addressed in this paper. This algorithm is used to deal with some control problems in cooperative multi-robot systems. Specifically, a leader-follower robotic system and a flocking system are investigated. In the leader-follower robotic system, the leader robot tries to track a desired trajectory, while the follower robot tries to follow the reader to keep a formation. Two different fuzzy policies are developed for the leader and follower, respectively. In the flocking system, multiple robots adopt the same fuzzy policy to flock. Initial fuzzy policies are manually crafted for these cooperative behaviors. The proposed learning algorithm finely tunes the parameters of the fuzzy policies through the policy gradient approach to improve control performance. Our simulation results demonstrate that the control performance can be improved after the learning.