A Multi-GPU implementation of a d2q37 lattice boltzmann code

  • Authors:
  • Luca Biferale;Filippo Mantovani;Marcello Pivanti;Fabio Pozzati;Mauro Sbragaglia;Andrea Scagliarini;Sebastiano Fabio Schifano;Federico Toschi;Raffaele Tripiccione

  • Affiliations:
  • University of Tor Vergata and INFN, Roma, Italy;Deutsches Elektronen Synchrotron (DESY), Zeuthen, Germany;University of Ferrara and INFN, Ferrara, Italy;Fondazione Bruno Kessler Trento, Trento, Italy;University of Tor Vergata and INFN, Roma, Italy;University of Barcelona, Barcelona, Spain;University of Ferrara and INFN, Ferrara, Italy;Eindhoven University of Technology, The Netherlands, CNR-IAC, Rome, Italy;University of Ferrara and INFN, Ferrara, Italy

  • Venue:
  • PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ≈30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster.