A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

  • Authors:
  • Jee Choi;Aparna Chandramowlishwaran;Kamesh Madduri;Richard Vuduc

  • Affiliations:
  • Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA;CSAIL, Massachusetts Institute of Technology, Cambridge, MA;Computer Science and Engineering, Pennsylvania State University, University Park, PA;Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA

  • Venue:
  • Proceedings of Workshop on General Purpose Processing Using GPUs
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an optimized CPU--GPU hybrid implementation and a GPU performance model for the kernel-independent fast multipole method (FMM). We implement an optimized kernel-independent FMM for GPUs, and combine it with our previous CPU implementation to create a hybrid CPU+GPU FMM kernel. When compared to another highly optimized GPU implementation, our implementation achieves as much as a 1.9× speedup. We then extend our previous lower bound analyses of FMM for CPUs to include GPUs. This yields a model for predicting the execution times of the different phases of FMM. Using this information, we estimate the execution times of a set of static hybrid schedules on a given system, which allows us to automatically choose the schedule that yields the best performance. In the best case, we achieve a speedup of 1.5× compared to our GPU-only implementation, despite the large difference in computational powers of CPUs and GPUs. We comment on one consequence of having such performance models, which is to enable speculative predictions about FMM scalability on future systems.