Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?

Authors:
Jaswinder Pal Singh;Edward Rothberg;Anoop Gupta
Affiliations:
Computer Systems Laboratory, Stanford University, Stanford, CA;Intel Supercomputer Systems, 14924 NW Greenbrier Pkwy, C06-09, Beaverton, OR;Computer Systems Laboratory, Stanford University, Stanford, CA
Venue:
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Year:
1994

Citing 15
Cited 8

On communication latency in PRAM computations

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
A more practical PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
The APRAM: incorporating asynchrony into the PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Communication complexity of PRAMs

Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation

Communications of the ACM
Towards a single model of efficient computation in real parallel machines

Future Generation Computer Systems - Special issue: PARLE 91
Designing broadcasting algorithms in the postal model for message-passing systems

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Stanford GraphBase: a platform for combinatorial computing

The Stanford GraphBase: a platform for combinatorial computing
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An efficient block-oriented approach to parallel sparse Cholesky factorization

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The working set model for program behavior

Communications of the ACM
Scaling Parallel Programs for Multiprocessors: Methodology and Examples

Computer
Parallelism in random access machines

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing

On characterizing bandwidth requirements of parallel applications

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Predicting application behavior in large scale shared-memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Performance benefits of virtual channels and adaptive routing: an application-driven study

ICS '97 Proceedings of the 11th international conference on Supercomputing
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements

IEEE Transactions on Parallel and Distributed Systems
Communication in Parallel Applications: Characterization and Sensitivity Analysis

ICPP '97 Proceedings of the international Conference on Parallel Processing
Abstracting network characteristics and locality properties of parallel systems

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Configuration Independent Analysis for Characterizing Shared-Memory Applications

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Causality tracking in causal message-logging protocols

Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, several theoretical models of parallel architectures have been proposed to replace the PRAM as the model that is presented to an algorithm designer. A primary focus of the new models is to include the cost of interprocessor communication, which is increasingly important in modern parallel architectures. We argue that modeling the communication costs in the architecture or system is only one part of the problem. The other, and usually much more difficult, part is modeling the communication properties of the algorithm itself, which provides necessary inputs into the architectural model to determine overall complexity. In this context, we make three main points in this paper: (i) It is incomplete to describe communication without regard to its relationship with replication. We propose a description of the communication-replication relationship in terms of the working set hierarchy of an algorithm. (ii) Both inherent communication and the communication-replication relationship can be very difficult to model in irregular, dynamic computations that are crucial in many real-world applications. We present some examples that demonstrate this difficulty. (iii) We believe that substantial leverage can be obtained in this effort from the computer systems community, which can provide a hierarchy of simulation and profiling tools—from abstract to detailed—tailored to the needs of the algorithm designers. We propose an initial set of simulation tools, and we discuss possible future refinements to this set.