A Programming Methodology for Dual-Tier Multicomputers

Authors:
Scott B. Baden;Stephen J. Fink
Affiliations:
Univ. of California at San Diego, La Jolla;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Year:
2000

Citing 27
Cited 14

Cedar Fortran and its compiler

CONPAR 90 Proceedings of the joint international conference on Vector and parallel processing
Implementation of a portable nested data-parallel language

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fortran M: a language for modular parallel programming

Journal of Parallel and Distributed Computing
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Perspectives on Supercomputing: Three Decades of Change

Computer
Modeling the effects of contention on application performance in multi-user environments

Modeling the effects of contention on application performance in multi-user environments
SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)

SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)
Programming tools and environments

Communications of the ACM
Efficient run-time support for irregular block-structured applications

Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
Application-level scheduling on distributed heterogeneous networks

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Communication overlap in multi-tier parallel algorithms

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs

IEEE Parallel & Distributed Technology: Systems & Technology
An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications

IEEE Transactions on Parallel and Distributed Systems
Multiple Data Parallelism with HPF and KeLP

HPCN Europe 1998 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Compositional C++: Compositional Parallel Programming

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Modernization of Legacy Application Software

PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Flexible Communication Mechanisms for Dynamic Structured Applications

IRREGULAR '96 Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems
Run-Time Support for Multi-tier Programming of Block-Structured Applications on SMP Clusters

ISCOPE '97 Proceedings of the Scientific Computing in Object-Oriented Parallel Environments
Scheduling From the Perspective of the Application

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A taxonomy of programming models for symmetric multiprocessors and SMP clusters

PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
Minimizing overhead in parallel algorithms through overlapping communication/computation

Minimizing overhead in parallel algorithms through overlapping communication/computation
Portable Run-Time Support for Dynamic Object-Oriented Parallel Processing

Portable Run-Time Support for Dynamic Object-Oriented Parallel Processing
A parallel software infrastructure for dynamic block-irregular scientific calculations

A parallel software infrastructure for dynamic block-irregular scientific calculations
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming

Library support for orthogonal processor groups

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
ORT: a communication library for orthogonal processor groups

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Performance Tradeoffs in Multi-tier Formulation of a Finite Difference Method

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Orthogonal Processor Groups for Message-Passing Programs

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Library support for hierarchical multi-processor tasks

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Solving irregularly structured problems based on distributed object model

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Improving the execution time of global communication operations

Proceedings of the 1st conference on Computing frontiers
SCALLOP: A Highly Scalable Parallel Poisson Solver in Three Dimensions

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Tlib-a library to support programming with hierarchical multi-processor tasks

Journal of Parallel and Distributed Computing
Overlapping communication and computation with OpenMP and MPI

Scientific Programming
Mixed task and data parallel executions in general linear methods

Scientific Programming
Deploying applications in multi-SAN SMP clusters

International Journal of Computational Science and Engineering
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Hierarchical partitioning and dynamic load balancing for scientific computation

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchically organized ensembles of shared memory multiprocessors possess a richer and more complex model of locality than previous generation multicomputers with single processor nodes. These dual-tier computers introduce many new factors into the programmer's performance model. We present a methodology for implementing block-structured numerical applications on dual-tier computers and a run-time infrastructure, called KeLP2, that implements the methodology. KeLP2 supports two levels of locality and parallelism via hierarchical SPMD control flow, run-time geometric meta-data, and asynchronous collective communication. KeLP applications can effectively overlap communication with computation under conditions where nonblocking point-to-point message passing fails to do so. KeLP's abstractions hide considerable detail without sacrificing performance and dual-tier applications written in KeLP consistently outperform equivalent single-tier implementations written in MPI. We describe the KeLP2 model and show how it facilitates the implementation of five block-structured applications specially formulated to hide communication latency on dual-tiered architectures. We support our arguments with empirical data from applications running on various single- and dual-tier multicomputers. KeLP2 supports a migration path from single-tier to dual-tier platforms and we illustrate this capability with a detailed programming example.