An improved CUDA-based implementation of differential evolution on GPU

Authors:
A. K. Qin;Federico Raimondo;Florence Forbes;Yew Soon Ong
Affiliations:
INRIA Grenoble Rhone-Alpes, Grenoble, France;INRIA Grenoble Rhone-Alpes, Grenoble, France;INRIA Grenoble Rhone-Alpes, Grenoble, France;Nanyang Technological University, Singapore, Singapore
Venue:
Proceedings of the 14th annual conference on Genetic and evolutionary computation
Year:
2012

Citing 6
Cited 1

Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series)

Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series)
Evolutionary Computing on Consumer Graphics Hardware

IEEE Intelligent Systems
GPU-accelerated differential evolutionary Markov Chain Monte Carlo method for multi-objective optimization over continuous space

Proceedings of the 2nd workshop on Bio-inspired algorithms for distributed systems
Programming Massively Parallel Processors: A Hands-on Approach

Programming Massively Parallel Processors: A Hands-on Approach
Many-threaded implementation of differential evolution for the CUDA platform

Proceedings of the 13th annual conference on Genetic and evolutionary computation
A co-evolutionary differential evolution algorithm for solving min-max optimization problems implemented on GPU using C-CUDA

Expert Systems with Applications: An International Journal

Differential evolution based human body pose estimation from point clouds

Proceedings of the 15th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern GPUs enable widely affordable personal computers to carry out massively parallel computation tasks. NVIDIA's CUDA technology provides a wieldy parallel computing platform. Many state-of-the-art algorithms arising from different fields have been redesigned based on CUDA to achieve computational speedup. Differential evolution (DE), as a very promising evolutionary algorithm, is highly suitable for parallelization owing to its data-parallel algorithmic structure. However, most existing CUDA-based DE implementations suffer from excessive low-throughput memory access and less efficient device utilization. This work presents an improved CUDA-based DE to optimize memory and device utilization: several logically-related kernels are combined into one composite kernel to reduce global memory access; kernel execution configuration parameters are automatically determined to maximize device occupancy; streams are employed to enable concurrent kernel execution to maximize device utilization. Experimental results on several numerical problems demonstrate superior computational time efficiency of the proposed method over two recent CUDA-based DE and the sequential DE across varying problem dimensions and algorithmic population sizes.