Communication performance issues for two cluster computers
ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Scaling applications to massively parallel machines using Projections performance analysis tool
Future Generation Computer Systems
Efficient remote block-level I/O over an RDMA-capable NIC
Proceedings of the 20th annual international conference on Supercomputing
Handling Topology Changes in InfiniBand
IEEE Transactions on Parallel and Distributed Systems
Optimization and bottleneck analysis of network block I/O in commodity storage systems
Proceedings of the 21st annual international conference on Supercomputing
Averages, distributions and scalability of MPI communication times for Ethernet and Myrinet networks
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs
IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of the Sun Fire Link SMP clusters
International Journal of High Performance Computing and Networking
TakTuk, adaptive deployment of remote executions
Proceedings of the 18th ACM international symposium on High performance distributed computing
Dynamic and Distributed Multipath Routing Policy for High-Speed Cluster Networks
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Towards Efficient MapReduce Using MPI
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects
International Journal of Parallel Programming
Scaling applications to massively parallel machines using Projections performance analysis tool
Future Generation Computer Systems
An improved model for predicting HPL performance
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Towards characterizing cloud backend workloads: insights from Google compute clusters
ACM SIGMETRICS Performance Evaluation Review
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Efficient RDMA-based multi-port collectives on multi-rail QsNetII clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Kernel-based offload of collective operations: implementation, evaluation and lessons learned
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
A model for the development of AS fabric management protocols
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Symmetric data objects and remote memory access communication for fortran-95 applications
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
High and stable performance under adverse traffic patterns of tori-connected torus network
Computers and Electrical Engineering
Post-failure recovery of MPI communication capability: Design and rationale
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
In this paper we present an in-depth description of the Quadrics interconnection network (QsNET) and an experimental performance evaluation on a 64-node AlphaServer cluster. We explore several performance dimensions and scaling properties of the network by using a collection of benchmarks, based on different traffic patterns. Experiments with permutation patterns and uniform traffic are conducted to illustrate the basic characteristics of the interconnect under conditions commonly created by parallel scientific applications. Moreover, the behavior of the QsNET under I/O traffic, and the influence of the placement of the I/O servers are analyzed. The effects of using dedicated I/O nodes or shared I/O nodes are also exposed. In addition, we evaluate how background I/O traffic interferes with other parallel applications running concurrently. The experimental results indicate that the QsNET provides excellent performance in most cases, with excellent contention resolution mechanisms. Some important guidelines for applications and I/O servers mapping on large-scale clusters are also given.