Future Generation Computer Systems
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Parallelization Strategies for Ant Colony Optimization
PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
An Island Model Based Ant System with Lookahead for the Shortest Supersequence Problem
PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
Ant Colony Optimization
Pseudo Parallel Ant Colony Optimization for Continuous Functions
ICNC '07 Proceedings of the Third International Conference on Natural Computation - Volume 04
Parallel Computing Experiences with CUDA
IEEE Micro
Implementation of Ant Colony Algorithm Based on GPU
CGIV '09 Proceedings of the 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization
A Novel Parallel Ant Colony Optimization Algorithm with Dynamic Transition Probability
IFCSTA '09 Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 02
IEEE Computational Intelligence Magazine
Ant system: optimization by a colony of cooperating agents
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Parallel multi-objective Ant Programming for classification using GPUs
Journal of Parallel and Distributed Computing
The continuous differential ant-stigmergy algorithm for numerical optimization
Computational Optimization and Applications
A parallel Bees Algorithm implementation on GPU
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Graphics Processing Units (GPUs) have evolved into highly parallel and fully programmable architecture over the past five years, and the advent of CUDA has facilitated their application to many real-world applications. In this paper, we deal with a GPU implementation of Ant Colony Optimization (ACO), a population-based optimization method which comprises two major stages: tour construction and pheromone update. Because of its inherently parallel nature, ACO is well-suited to GPU implementation, but it also poses significant challenges due to irregular memory access patterns. Our contribution within this context is threefold: (1) a data parallelism scheme for tour construction tailored to GPUs, (2) novel GPU programming strategies for the pheromone update stage, and (3) a new mechanism called I-Roulette to replicate the classic roulette wheel while improving GPU parallelism. Our implementation leads to factor gains exceeding 20x for any of the two stages of the ACO algorithm as applied to the TSP when compared to its sequential counterpart version running on a similar single-threaded high-end CPU. Moreover, an extensive discussion focused on different implementation paths on GPUs shows the way to deal with parallel graph connected components. This, in turn, suggests a broader area of inquiry, where algorithm designers may learn to adapt similar optimization methods to GPU architecture.