Local Search in Combinatorial Optimization
Local Search in Combinatorial Optimization
Solving Real-World Linear Programs: A Decade and More of Progress
Operations Research
A general heuristic for vehicle routing problems
Computers and Operations Research
An efficient variable neighborhood search heuristic for very large scale vehicle routing problems
Computers and Operations Research
Active-guided evolution strategies for large-scale capacitated vehicle routing problems
Computers and Operations Research
Mathematical Programming: Series A and B
A Unified Modeling and Solution Framework for Vehicle Routing and Local Search-Based Metaheuristics
INFORMS Journal on Computing
State-of-the-art in heterogeneous computing
Scientific Programming
GPU-based island model for evolutionary algorithms
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Parallelization Strategies for Ant Colony Optimisation on GPUs
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Finding extremal sets on the GPU
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
We study how to implement local search efficiently on data parallel accelerators such as Graphics Processing Units. The Distance-constrained Capacitated Vehicle Routing Problem, a computationally very hard discrete optimization problem with high industrial relevance, is the selected vehicle for our investigations. More precisely, we investigate local search with the Best Improving strategy for the 2-opt and 3-opt operators on a giant tour representation. Resource extension functions are used for constant time move evaluation. Using CUDA, a basic implementation called The Benchmark Version has been developed and deployed on a Fermi architecture Graphics Processing Unit. Both neighborhood setup and evaluation are performed entirely on the device. The Benchmark Version is the initial step of an incremental improvement process where a number of important implementation aspects have been systematically studied. Ten well-known test instances from the literature are used in computational experiments, and profiling tools are used to identify bottlenecks. In the final version, the device is fully saturated, given a large enough problem instance. A speedup of almost an order of magnitude relative to The Benchmark Version is observed. We conclude that, with some effort, local search may be implemented very efficiently on Graphics Processing Units. Our experiments show that a maximum efficiency, however, requires a neighborhood cardinality of at least one million. Full exploration of a billion neighbors takes a few seconds and may be deemed too expensive with the current technology. Reduced neighborhoods through filtering is an obvious remedy. Experiments on simple models of neighborhood filtering indicate, however, that the speedup effect is limited on data parallel accelerators. We believe these insights are valuable in the design of new metaheuristics that fully utilize modern, heterogeneous processors.