Congestion avoidance and control
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
A bridging model for parallel computation
Communications of the ACM
Optimal communication algorithms for hypercubes
Journal of Parallel and Distributed Computing
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
All-to-All Personalized Communication in a Wormhole-Routed Torus
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
Optimal All-to-All Personalized Exchange in Self-Routable Multistage Networks
IEEE Transactions on Parallel and Distributed Systems
Scatter and gather operations on an asynchronous communication model
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Scalable Parallel Computing: Technology,Architecture,Programming
Scalable Parallel Computing: Technology,Architecture,Programming
Multiphase Complete Exchange on Paragon, SP2, and CS-2
IEEE Parallel & Distributed Technology: Systems & Technology
Balancing Contention and Synchronization on the Intel Paragon
IEEE Parallel & Distributed Technology: Systems & Technology
All-To-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes
IEEE Transactions on Parallel and Distributed Systems
Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Exploiting Global Structure for Performance on Clusters
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Low-Latency Communication over Fast Ethernet
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Realistic Communication Model for Parallel Computing on Cluster
IWCC '99 Proceedings of the 1st IEEE Computer Society International Workshop on Cluster Computing
Invited Performance of the communication layers of TCP/IP with the Myrinet gigabit LAN
Computer Communications
A taxonomy for congestion control algorithms in packet switching networks
IEEE Network: The Magazine of Global Internetworking
GPU Cluster for High Performance Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters
IEEE Transactions on Parallel and Distributed Systems
Contention-aware scheduling with task duplication
Journal of Parallel and Distributed Computing
Contention-free many-to-many communication scheduling for high performance clusters
ICDCIT'11 Proceedings of the 7th international conference on Distributed computing and internet technology
Scheduling tasks and communications on a hierarchical system with message contention
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Total exchange performance modelling under network contention
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Fast and efficient total exchange on two clusters
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
A lot of efforts have been devoted to address the software overhead problem in the past decade, which is known as the major hindrance on high-speed communication. However, this paper shows that having a low-latency communication system does not guarantee to achieve high performance, as there are other communication issues that have not been fully addressed by the use of low-latency communication, such as contention and scheduling of communication events. In this paper, we use the complete exchange operation as a case study to show that with careful design of communication schedules, we can achieve efficient communication as well as prevent congestion. We have developed a complete exchange algorithm, the Synchronous Shuffle Exchange, which is an optimal algorithm on the non-blocking network. To avoid congestion loss caused by the non-deterministic delays in communication events, a global congestion control scheme is introduced. This scheme coordinates all participating nodes to monitor and regulate the traffic load, which effectively avoids congestion loss and maintains sufficient throughput to maximize the performance. To improve the effectiveness of the congestion control scheme when working on the hierarchical network, we incorporate information on the network topology to devise a contention-aware permutation. This permutation scheme generates a communication schedule, which is both node and switch contention-free as well as distributing the network loads more evenly across the hierarchy. This relieves the congestion build-up at the uplink ports and improves the synchronism of the traffic information exchange between cluster nodes. Performance results of our implementation on a 32-node cluster with various network configurations are examined and reported in this paper.