Software Techniques for Improving MPP Bulk-Transfer Performance

Authors:
Eric A. Brewer;Paul Gauthier;Armando Fox;Angela Schuett
Affiliations:
-;-;-;-
Venue:
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Year:
1996

Citing 12
Cited 0

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
High speed switch scheduling for local area networks

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Parallel hierarchical N-body methods

Parallel hierarchical N-body methods
Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
Software overhead in messaging layers: where does the time go?

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Remote queues: exposing message queues for optimization and atomicity

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Scheduling of unstructured communication on the Intel iPSC/860

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Limits on Interconnection Network Performance

IEEE Transactions on Parallel and Distributed Systems
How to Get Good Performance from the CM-5 Data Network

Proceedings of the 8th International Symposium on Parallel Processing
Many-to-many personalized communication with bounded traffic

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR

THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
SPLASH: Stanford parallel applications for shared-memory*

SPLASH: Stanford parallel applications for shared-memory*

Quantified Score

Hi-index	0.00

Visualization

Abstract

Brewer & Kuszmaul (1994) demonstrated how barriers and traffic interleaving can alleviate the problem of bulk-transfer performance degradation on the Thinking Machines CM-5 massively parallel processor (MPP) by exploiting the observation that one-on-one communication avoids network congestion. We apply and extend these techniques on the Intel Paragon and MIT Alewife machines. Because these machines lack the CM-5's fast hardware support for barriers, we introduce a token-passing scheme that avoids barriers while maintaining one-on-one communication. We also introduce a new algorithm-distributed dynamic scheduling-that brings Brewer & Kuszmaul's observations to bear on irregular traffic patterns by massaging traffic into a sequence of near-permutations at runtime, without requiring any preprocessing or global state. The measured performance of our algorithm exceeds that of traffic interleaving (the most effective technique proposed by Brewer & Kuszmaul) on all three platforms, and is comparable to the performance of static scheduling, which requires preprocessing and global state.