Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
The communication challenge for MPP: Intel Paragon and Meiko CS-2
Parallel Computing
Meiko CS-2 interconnect Elan-Elite design
Parallel Computing - Special double issue: SUPRENUM and GENESIS
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
Multiphase Complete Exchange on Paragon, SP2, and CS-2
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Evaluation of the Quadrics Interconnection Network
Cluster Computing
Using Multirail Networks in High-Performance Clusters
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
Efficient and Scalable All-to-All Personalized Exchange for InfiniBand-Based Clusters
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
An Evaluation of the Myrinet/GM2 Two-Port Networks
LCN '04 Proceedings of the 29th Annual IEEE International Conference on Local Computer Networks
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Scalable NIC-based Reduction on Large-scale Clusters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Analysis of MPI Collective Operations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
A comparison of 4X InfiniBand and Quadrics Elan-4 technologies
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
High performance RDMA based all-to-all broadcast for infiniband clusters
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Performance without pain = productivity: data layout and collective communication in UPC
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Optimizing collective communication on multicores
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Hi-index | 0.00 |
Many scientific applications use MPI collective communications intensively. Therefore, efficient and scalable implementation of collective operations is critical to the performance of such applications running on clusters. Quadrics QsNetII is a high-performance interconnect for clusters that implements some collectives at the Elan level. These collectives are directly used by their corresponding MPI collectives. Quadrics software supports point-to-point striping over multi-rail QsNetII networks. However, multirail collectives have not been supported. In this work, we propose a number of RDMA-based multi-port collectives over multi-rail QsNetII clusters directly at the Elan level. Our performance results indicate that the proposed multiport gather gains an improvement of up to 6.35 for 1MB message over the native elan_gather. The proposed multiport all-to-all performs better than the native elan_alltoall by a factor of 2.19 for 16KB message. Moreover, we have also proposed two algorithms for the scatter operation.