Improved GROMACS scaling on ethernet switched clusters

Authors:
Carsten Kutzner;David van der Spoel;Martin Fechner;Erik Lindahl;Udo W. Schmitt;Bert L. de Groot;Helmut Grubmüller
Affiliations:
Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany;Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden;Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany;Stockholm Bioinformatics Center, SCFAB, Stockholm University, Stockholm, Sweden;Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany;Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany;Department of Theoretical and Computational Biophysics, Max-Planck-Institute of Biophysical Chemistry, Göttingen, Germany
Venue:
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Year:
2006

Citing 1
Cited 0

An MPI prototype for compiled communication on Ethernet switched clusters

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigated the prerequisites for decent scaling of the GROMACS 3.3 molecular dynamics (MD) code [1] on Ethernet Beowulf clusters. The code uses the MPI standard for communication between the processors and scales well on shared memory supercomputers like the IBM p690 (Regatta) and on Linux clusters with a high-bandwidth/low latency network. On Ethernet switched clusters, however, the scaling typically breaks down as soon as more than two computational nodes are involved. For an 80k atom MD test system, exemplary speedups SpN on N CPUs are Sp8 = 6.2, Sp16 = 10 on a Myrinet dual-CPU 3 GHz Xeon cluster, Sp16 = 11 on an Infiniband dual-CPU 2.2 GHz Opteron cluster, and Sp32 = 21 on one Regatta node. However, the maximum speedup we could initially reach on our Gbit Ethernet 2 GHz Opteron cluster was Sp4 = 3 using two dual-CPU nodes. Employing more CPUs only led to slower execution (Table 1).