Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing

Authors:
Kai Kwang;Masahiro Arakawa
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1996

Citing 15
Cited 14

Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
A bridging model for parallel computation

Communications of the ACM
A framework for benchmark performance analysis

Computer benchmarks
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Public international benchmarks for parallel computers: PARKBENCH committee: Report-1

Scientific Programming
SP2 system architecture

IBM Systems Journal
The SP2 high-performance switch

IBM Systems Journal
High-performance parallel implementations of the NAS kernel benchmarks on the IBM SP2

IBM Systems Journal
Early prediction of MPP performance: the SP2, T3D, and Paragon experiences

Parallel Computing
Studies in Computational Science: Parallel Programming Paradigms

Studies in Computational Science: Parallel Programming Paradigms
Advanced Computer Architecture: Parallelism,Scalability,Programmability

Advanced Computer Architecture: Parallelism,Scalability,Programmability
Parallel Computers Two: Architecture, Programming and Algorithms

Parallel Computers Two: Architecture, Programming and Algorithms
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2

IEEE Parallel & Distributed Technology: Systems & Technology

Performance evaluation of message-driven parallel VLSI CAD applications on general purpose multiprocessors

ICS '97 Proceedings of the 11th international conference on Supercomputing
Resource Scaling Effects on MPP Performance: The STAP Benchmark Implications

IEEE Transactions on Parallel and Distributed Systems
Design and Performance Evaluation of a Portable Parallel Library for Space-Time Adaptive Processing

IEEE Transactions on Parallel and Distributed Systems
A Transformation Approach to Derive Efficient Parallel Implementations

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Scatter and gather operations on an asynchronous communication model

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Deriving Array Distributions by Optimization Techniques

The Journal of Supercomputing
Link contention-constrained scheduling and mapping of tasks

Cluster Computing
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2

IEEE Parallel & Distributed Technology: Systems & Technology
CASCH: A Tool for Computer-Aided Scheduling

IEEE Concurrency
High-Speed Image reconstruction based on CBP and Fourier Inversion Methods

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Performance Analysis of a Myrinet-Based Cluster

Cluster Computing
A pipeline technique for dynamic data transfer on a multiprocessor grid

International Journal of Parallel Programming
A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems

Journal of Parallel and Distributed Computing
Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi-core processors

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper evaluates the IBM SP2 architecture, the AIX parallel programming environment, and the IBM message-passing library (MPL) through STAP (Space-Time Adaptive Processing) benchmark experiments. Only coarse-grain parallelism was exploited on the SP2 due to its high communication overhead. A new parallelization scheme is developed for programming message passing multicomputers. Parallel STAP benchmark structures are illustrated with domain decomposition, efficient mapping of partitioned programs, and optimization of collective communication operations. We measure the SP2 performance in terms of execution time, Gflop/s rate, speedup over a single SP2 node, and overall system utilization. With 256 nodes, the Maui SP2 demonstrated the best performance of 23 Gflop/s in executing the High-Order Post-Doppler program, corresponding to a 34% system utilization. We have conducted a scalability analysis to reveal the performance growth rate as a function of machine size and STAP problem size. Important lessons learned from these parallel processing benchmark experiments are discussed in the context of real-time, adaptive, radar signal processing on massively parallel processors (MPP).