Improved GROMACS scaling on ethernet switched clusters

  • Authors:
  • Carsten Kutzner;David van der Spoel;Martin Fechner;Erik Lindahl;Udo W. Schmitt;Bert L. de Groot;Helmut Grubmüller

  • Affiliations:
  • Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany;Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden;Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany;Stockholm Bioinformatics Center, SCFAB, Stockholm University, Stockholm, Sweden;Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany;Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany;Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany

  • Venue:
  • EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigated the prerequisites for decent scaling of the GROMACS 3.3 molecular dynamics (MD) code [1] on Ethernet Beowulf clusters. The code uses the MPI standard for communication between the processors and scales well on shared memory supercomputers like the IBM p690 (Regatta) and on Linux clusters with a high-bandwidth/low latency network. On Ethernet switched clusters, however, the scaling typically breaks down as soon as more than two computational nodes are involved. For an 80k atom MD test system, exemplary speedups SpN on N CPUs are Sp8 = 6.2, Sp16 = 10 on a Myrinet dual-CPU 3 GHz Xeon cluster, Sp16 = 11 on an Infiniband dual-CPU 2.2 GHz Opteron cluster, and Sp32 = 21 on one Regatta node. However, the maximum speedup we could initially reach on our Gbit Ethernet 2 GHz Opteron cluster was Sp4 = 3 using two dual-CPU nodes. Employing more CPUs only led to slower execution (Table 1).