Improving Application Performance on the HP/Convex Exemplar

Authors:
Thomas Sterling;Phillip Merkey;Daniel Savarese
Affiliations:
-;-;-
Venue:
Computer
Year:
1996

Citing 5
Cited 3

Vectorization of tree traversals

Journal of Computational Physics
PVM: a framework for parallel distributed computing

Concurrency: Practice and Experience
Implications of hierarchical N-body methods for multiprocessor architectures

ACM Transactions on Computer Systems (TOCS)
A Performance Evaluation of the Convex SPP-1000 Scalable Shared Memory Parallel Computer

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An Initial evaluation of the Convex SPP-1000 for Earth and Space Science Applications

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture

Hardware fault containment in scalable shared-memory multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
Quantitative Characterization and Analysis of the I/O Behavior of a Commercial Distributed-Shared-Memory Machine

IEEE Transactions on Parallel and Distributed Systems
Modeling of interconnection subsystems for massively parallel computers

Performance Evaluation

Quantified Score

Hi-index	4.10

Visualization

Abstract

The Earth and space sciences community faces rich computational challenges ranging from static, regular, and embarrassingly parallel to dynamic, unstructured, and tightly coupled. This problem domain requires highly scalable systems exhibiting broad generality, efficiency, and programmability. These capabilities are appearing in the emerging scalable shared memory cache-coherent architectures like that of the HP/Convex Exemplar SPP-1000. The goal of this class of architecture is to make scientific programming as easy and efficient as it is on vector supercomputers. The authors describe the Exemplar's architecture, whose global system organization comprises up to 16 multiprocessors interconnected by four SCI (Scalable Coherent Interface) ring networks. They then present the findings from four applications: the piecewise parabolic method, a finite-element method for unstructured meshes, a tree code for the n-body problem, and a particle-in-cell code. The authors present application performance data derived after the Exemplar at Goddard Space Flight Center went into production use. These studies expose the operational properties of the Exemplar and determine its suitability for Earth and space sciences applications. The testing reveals that global cache coherence can be used effectively to simplify programming and data migration. However, the basic problem of locality sensitivity still demands direct programmer involvement to achieve effective system behavior. The question of whether message-passing or shared memory programming models are better remains open.