An MPI prototype for compiled communication on Ethernet switched clusters
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Hi-index | 0.00 |
We investigated the prerequisites for decent scaling of the GROMACS 3.3 molecular dynamics (MD) code [1] on Ethernet Beowulf clusters. The code uses the MPI standard for communication between the processors and scales well on shared memory supercomputers like the IBM p690 (Regatta) and on Linux clusters with a high-bandwidth/low latency network. On Ethernet switched clusters, however, the scaling typically breaks down as soon as more than two computational nodes are involved. For an 80k atom MD test system, exemplary speedups SpN on N CPUs are Sp8 = 6.2, Sp16 = 10 on a Myrinet dual-CPU 3 GHz Xeon cluster, Sp16 = 11 on an Infiniband dual-CPU 2.2 GHz Opteron cluster, and Sp32 = 21 on one Regatta node. However, the maximum speedup we could initially reach on our Gbit Ethernet 2 GHz Opteron cluster was Sp4 = 3 using two dual-CPU nodes. Employing more CPUs only led to slower execution (Table 1).