Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Distributed Shared Memory: A Survey of Issues and Algorithms
Computer - Distributed computing systems: separate resources acting as one
Designing broadcasting algorithms in the postal model for message-passing systems
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Optimal Broadcast in All-Port Wormhole-Routed Hypercubes
IEEE Transactions on Parallel and Distributed Systems
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Broadcasting on Meshes with Worm-Hole Routing
Broadcasting on Meshes with Worm-Hole Routing
Multicast virtual topologies for collective communication in MPCs and ATM clusters
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A Broadcast Algorithm for All-Port Wormhole-Routed Torus Networks
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 24th annual international symposium on Computer architecture
Adaptive Fault-Tolerant Routing in Cube-Based Multicomputers Using Safety Vectors
IEEE Transactions on Parallel and Distributed Systems
Depth contention-free broadcasting on torus networks
ICS '98 Proceedings of the 12th international conference on Supercomputing
A Theory for Total Exchange in Multidimensional Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding
IEEE Transactions on Parallel and Distributed Systems
Wormhole routing techniques for directly connected multicomputer systems
ACM Computing Surveys (CSUR)
Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach
IEEE Transactions on Parallel and Distributed Systems
Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks
IEEE Transactions on Parallel and Distributed Systems
Toward Optimal Complete Exchange on Wormhole-Routed Tori
IEEE Transactions on Computers
Asynchronous Tree-Based Multicasting in Wormhole-Switched MINs
IEEE Transactions on Parallel and Distributed Systems
Algebraic Foundations and Broadcasting Algorithms for Wormhole-Routed All-Port Tori
IEEE Transactions on Computers
Recursive Cube of Rings: A New Topology for Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Wormhole Broadcast in Hypercubes
The Journal of Supercomputing
Configurable Algorithms for Complete Exchange in 2D Meshes
IEEE Transactions on Parallel and Distributed Systems
Communication Reduction in Multiple Multicasts Based on Hybrid Static-Dynamic Scheduling
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
All-to-All Personalized Communication in Multidimensional Torus and Mesh Networks
IEEE Transactions on Parallel and Distributed Systems
Unicast-based broadcast: an analysis for the hypercube with adaptive routing
Proceedings of the 2001 ACM symposium on Applied computing
Architectural Support for Efficient Multicasting in Irregular Networks
IEEE Transactions on Parallel and Distributed Systems
One-to-Many routing on the mesh
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing
IEEE Transactions on Parallel and Distributed Systems
Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
An Efficient Adaptive Scheduling Scheme for Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Circuit-Switched Broadcasting in Multi-Port Multi-Dimensional Torus Networks
The Journal of Supercomputing
Broadcasting in all-output-port meshes of trees with distance-insensitive switching
Journal of Parallel and Distributed Computing
All-To-All Communication with Minimum Start-Up Costs in 2D/3D Tori and Meshes
IEEE Transactions on Parallel and Distributed Systems
Journal of Parallel and Distributed Computing
A foundation for designing deadlock-free routing algorithms in wormhole networks
Journal of the ACM (JACM)
Performance Benefits of NIC-Based Barrier on Myrinet/GM
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Adaptive Path-Based Multicast on Wormhole-Routed Hypercubes
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Performance Analysis of Multistage Interconnection Networks using a Multicast Algorithm
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Nearly Optimal Algorithms for Broadcast on d-Dimensional All-Port and Wormhole-Routed Torus
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An analytical model of wormhole-routed hypercubes under broadcast traffic
Performance Evaluation
Task migration in n-dimensional wormhole-routed mesh multicomputers
Journal of Systems Architecture: the EUROMICRO Journal
Multipath-Based Multicasting Strategies for Wormhole-Routed Star Graph Interconnection Networks
The Journal of Supercomputing
Towards scalable collective communication for multicomputer interconnection networks
Information Sciences: an International Journal - Special issue: Information technology
IEEE Transactions on Parallel and Distributed Systems
A flit level simulator for wormhole routing
ACM-SE 38 Proceedings of the 38th annual on Southeast regional conference
A plane-based broadcast algorithm for multicomputer networks
Journal of Systems Architecture: the EUROMICRO Journal
On balancing network traffic in path-based multicast communication
Future Generation Computer Systems - Systems performance analysis and evaluation
Performance of deterministic and adaptive broadcast algorithms in multicomputer networks
International Journal of High Performance Computing and Networking
Pipelined circuit switching: Analysis for the torus with non-uniform traffic
Journal of Systems Architecture: the EUROMICRO Journal
Performance modelling of pipelined circuit switching in hypercubes with hot spot traffic
Microprocessors & Microsystems
Parallel Lagrange interpolation on k-ary n-cubes with maximum channel utilization
The Journal of Supercomputing
TTPM - An efficient deadlock-free algorithm for multicast communication in 2D torus networks
Journal of Systems Architecture: the EUROMICRO Journal
QCG-OMPI: MPI applications on grids
Future Generation Computer Systems
Improving communication performance in dense linear algebra via topology aware collectives
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Task migration in all-port wormhole-routed 2D mesh multicomputers
Information Sciences: an International Journal
Hi-index | 4.11 |
The supercomputer market is now dominated by parallel architectures, among which massively parallel computers (MPCs) are an important class of systems. The memory of an MPC is physically distributed among an ensemble of computing nodes that communicate by sending data through a network. Communication operations can be either point-to-point, with one source and one destination, or collective, with more than two participating processes. The design of collective communication operations depends on the MPC's underlying network architecture. While there has been little consensus on some aspects of communication architectures, such as network topology, a good deal of agreement exists regarding the most efficient way to switch messages through the network. Most MPCs use wormhole routing, in which each message is divided into small pieces that are pipelined through the network. Compared with the store-and-forward switching method used in early multicomputers, wormhole routing reduces the effect of path length on communication time. However, in situations where multiple messages exist in the network concurrently, wormhole routing can exacerbate channel contention, which occurs when blocked messages hold some communication channels while waiting for others. Invoking a collective operation, which can involve many messages, poses this situation. In recent years, many projects have addressed the design of efficient collective communication algorithms for wormhole-routed systems. By exploiting the relative distance-insensitivity of wormhole routing, these new algorithms often differ fundamentally from their store-and-forward counterparts. This article examines software and hardware approaches to implementing collective communication operations, illustrating several issues arising in this research area and describing the major classes of algorithms proposed to solve these problems.