Efficient local search on the GPU-Investigations on the vehicle routing problem

Authors:
Christian Schulz
Affiliations:
-
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 10
Cited 2

Local Search in Combinatorial Optimization

Local Search in Combinatorial Optimization
Solving Real-World Linear Programs: A Decade and More of Progress

Operations Research
A general heuristic for vehicle routing problems

Computers and Operations Research
An efficient variable neighborhood search heuristic for very large scale vehicle routing problems

Computers and Operations Research
Active-guided evolution strategies for large-scale capacitated vehicle routing problems

Computers and Operations Research
An exact algorithm for the vehicle routing problem based on the set partitioning formulation with additional cuts

Mathematical Programming: Series A and B
A Unified Modeling and Solution Framework for Vehicle Routing and Local Search-Based Metaheuristics

INFORMS Journal on Computing
State-of-the-art in heterogeneous computing

Scientific Programming
GPU-based island model for evolutionary algorithms

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Parallelization Strategies for Ant Colony Optimisation on GPUs

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

A parallel memetic algorithm on GPU to solve the task scheduling problem in heterogeneous environments

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Finding extremal sets on the GPU

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study how to implement local search efficiently on data parallel accelerators such as Graphics Processing Units. The Distance-constrained Capacitated Vehicle Routing Problem, a computationally very hard discrete optimization problem with high industrial relevance, is the selected vehicle for our investigations. More precisely, we investigate local search with the Best Improving strategy for the 2-opt and 3-opt operators on a giant tour representation. Resource extension functions are used for constant time move evaluation. Using CUDA, a basic implementation called The Benchmark Version has been developed and deployed on a Fermi architecture Graphics Processing Unit. Both neighborhood setup and evaluation are performed entirely on the device. The Benchmark Version is the initial step of an incremental improvement process where a number of important implementation aspects have been systematically studied. Ten well-known test instances from the literature are used in computational experiments, and profiling tools are used to identify bottlenecks. In the final version, the device is fully saturated, given a large enough problem instance. A speedup of almost an order of magnitude relative to The Benchmark Version is observed. We conclude that, with some effort, local search may be implemented very efficiently on Graphics Processing Units. Our experiments show that a maximum efficiency, however, requires a neighborhood cardinality of at least one million. Full exploration of a billion neighbors takes a few seconds and may be deemed too expensive with the current technology. Reduced neighborhoods through filtering is an obvious remedy. Experiments on simple models of neighborhood filtering indicate, however, that the speedup effect is limited on data parallel accelerators. We believe these insights are valuable in the design of new metaheuristics that fully utilize modern, heterogeneous processors.