Communication performance issues for two cluster computers

Authors:
Francis A. Vaughan;Duncan A. Grove;Paul D. Coddington
Affiliations:
Department of Computer Science, University of Adelaide, Adelaide, SA 5005, Australia;Department of Computer Science, University of Adelaide, Adelaide, SA 5005, Australia;Department of Computer Science, University of Adelaide, Adelaide, SA 5005, Australia
Venue:
ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
Year:
2003

Citing 7
Cited 1

On efficiently implementing global time for performance evaluation on multiprocessor systems

Journal of Parallel and Distributed Computing
98¢/Mflops/s ultra-large-scale neural-network training on a pIII cluster

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Reducing the Variance of Point-to-Point Transfers for Parallel Real-Time Programs

IEEE Parallel & Distributed Technology: Systems & Technology
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Performance Evaluation of the Quadrics Interconnection Network

Cluster Computing
A Design Study of Alternative Network Topologies for the Beowulf Parallel Workstation

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
The Analysis and Optimization of Collective Communications on a Beowulf Cluster

ICPADS '02 Proceedings of the 9th International Conference on Parallel and Distributed Systems

Averages, distributions and scalability of MPI communication times for Ethernet and Myrinet networks

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clusters of commodity machines have become a popular way of building cheap high performance parallel computers. Many of these designs rely on standard Ethernet networks as a system interconnect. We have profiled the performance of some standard message passing communication on commodity clusters using MPIBench, a tool for benchmarking the performance of MPI routines that uses a highly accurate, globally synchronised clock. The results suggest that existing methodologies of performance characterisation are inadequate. Tests were performed on two clusters, one with a conventional network architecture of switches connected via a high bandwidth backbone, the other with a tetrahedral network topology that potentially provides for lower contention and higher bandwidth. Where packet loss does not occur, performance in either system is good and degrades smoothly with load. However, packet loss is found to occur at any load and the consequent invocation of the TCP/IP timeout and congestion control mechanisms affect performance to a much greater than expected level. The nature of many parallel programs causes overall performance to drop to the worst case rather than the average. The value of MPIBench in profiling communication in parallel systems is clearly demonstrated, particularly through its generation of probability distributions which allow detailed analyses of performance problems.