A Parallel Algorithm for Graph Matching and Its MasPar Implementation
IEEE Transactions on Parallel and Distributed Systems
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Branch-and-Bound interval global optimization on shared memory multiprocessors
Optimization Methods & Software - THE JOINT EUROPT-OMS CONFERENCE ON OPTIMIZATION, 4-7 JULY, 2007, PRAGUE, CZECH REPUBLIC, PART I
A New Parallel Schema for Branch-and-Bound Algorithms Using GPGPU
SBAC-PAD '11 Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing
GPU Implementation of the Branch and Bound Method for Knapsack Problems
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem
CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing
Hi-index | 0.00 |
In this paper, we revisit the design and implementation of Branch-and-Bound (B&B) algorithms for solving large combinatorial optimization problems on GPU-enhanced multi-core machines. B&B is a tree-based optimization method that uses four operators (selection, branching, bounding and pruning) to build and explore a highly irregular tree representing the solution space. In our previous works, we have proposed a GPU-accelerated approach in which only a single CPU core is used and only the bounding operator is performed on the GPU device. Here, we extend the approach (LL-GB&B) in order to minimize the CPU-GPU communication latency and thread divergence. Such an objective is achieved through a GPU-based fine-grained parallelization of the branching and pruning operators in addition to the bounding one. The second contribution consists in investigating the combination of a GPU with multi-core processing. Two scenarios have been explored leading to two approaches: a concurrent (RLL-GB&B) and a cooperative one (PLL-GB&B). In the first one, the exploration process is performed concurrently by the GPU and the CPU cores. In the cooperative approach, the CPU cores prepare and off-load to GPU pools of tree nodes using data streaming while the GPU performs the exploration. The different approaches have been extensively experimented on the Flowshop scheduling problem. Compared to a single CPU-based execution, LL-GB&B allows accelerations up to (x160) for large problem instances. Moreover, when combining multi-core and GPU, we figure out that using RLL-GB&B is not beneficial while PLL-GB&B enables an improvement up to 36% compared to LL-GB&B.