A memory access model for highly-threaded many-core architectures
Future Generation Computer Systems
Hi-index | 0.00 |
This paper presents a blocked algorithm for the all-pairs shortest paths (APSP) problem for a hybrid CPU-GPU system. In the blocked APSP algorithm, the amount of data communication between CPU (host) memory and GPU memory is minimized. When a problem size (the number of vertices in a graph) is large enough compared with a blocking factor, the blocked algorithm virtually requires CPU$\rightleftharpoons$GPU exchanging of two block matrices for a block computation on the GPU. We also estimate a required memory/communication bandwidth to utilize the GPU efficiently. On a system containing an Intel West mere CPU (Core i7 970) and an AMD Cypress GPU (Radeon HD 5870), our implementation of the blocked APSP algorithm achieves the performance up to 1 TFlop/s in single precision.