Scaling ant colony optimization with hierarchical reinforcement learning partitioning

Authors:
Erik J. Dries;Gilbert L. Peterson
Affiliations:
Air Force Institute of Technology, Wright-Patterson AFB, OH, USA;Air Force Institute of Technology, Wright-Patterson AFB, OH, USA
Venue:
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Year:
2008

Citing 9
Cited 1

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Swarm intelligence: from natural to artificial systems

Swarm intelligence: from natural to artificial systems
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Ant colony optimization

IEEE Computational Intelligence Magazine
Ant colony system: a cooperative learning approach to the traveling salesman problem

IEEE Transactions on Evolutionary Computation

Adaptive decision making in ant colony system by reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper merges hierarchical reinforcement learning (HRL) with ant colony optimization (ACO) to produce a HRL ACO algorithm capable of generating solutions for large domains. This paper describes two specific implementations of the new algorithm: the first a modification to Dietterich's MAXQ-Q HRL algorithm, the second a hierarchical ant colony system algorithm. These implementations generate faster results, with little to no significant change in the quality of solutions for the tested problem domains. The application of ACO to the MAXQ-Q algorithm replaces the reinforcement learning, Q-learning, with the modified ant colony optimization method, Ant-Q. This algorithm, MAXQ-AntQ, converges to solutions not significantly different from MAXQ-Q in 88% of the time. This paper then transfers HRL techniques to the ACO domain and traveling salesman problem (TSP). To apply HRL to ACO, a hierarchy must be created for the TSP. A data clustering algorithm creates these subtasks, with an ACO algorithm to solve the individual and complete problems. This paper tests two clustering algorithms, k-means and G-means. The results demonstrate the algorithm with data clustering produces solutions 20 times faster with 5-10% decrease in solution quality due to the effects of clustering.