Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance

Authors:
Gheith A. Abandah;Edward S. Davidson
Affiliations:
Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor;Advanced Computer Architecture Laboratory, University of Michigan, Ann Arbor
Venue:
Proceedings of the 25th annual international symposium on Computer architecture
Year:
1998

Citing 15
Cited 6

Experimentally Characterizing the Behavior of Multiprocessor Memory Systems: A Case Study

IEEE Transactions on Software Engineering
Directory-Based Cache Coherence in Large-Scale Multiprocessors

Computer
Micro benchmark analysis of the KSR1

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The GLOW cache coherence protocol extensions for widely shared data

ICS '96 Proceedings of the 10th international conference on Supercomputing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Characterizing Distributed Shared Memory Performance: A Case Study of the Convex SPP1000

IEEE Transactions on Parallel and Distributed Systems
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Scalable Shared-Memory Multiprocessing

Scalable Shared-Memory Multiprocessing
Performance Features of the PA7100 Microprocessor

IEEE Micro
Advanced performance features of the 64-bit PA-8000

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
A highly scalable system utilizing up to 128 PA-RISC processors

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
The evolution of the HP/Convex Exemplar

COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Architecture

Communications of the ACM
Scaling application performance on a cache-coherent multiprocessor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Quantitative Characterization and Analysis of the I/O Behavior of a Commercial Distributed-Shared-Memory Machine

IEEE Transactions on Parallel and Distributed Systems
SMTp: An Architecture for Next-generation Scalable Multi-threading

Proceedings of the 31st annual international symposium on Computer architecture
An experimental evaluation of the HP V-class and SGI origin 2000 multiprocessors using microbenchmarks and scientific applications

International Journal of Parallel Programming

Quantified Score

Hi-index	0.02

Visualization

Abstract

Advances in microarchitecture, packaging, and manufacturing processes enable designers to build new systems with higher performance and scalability. Using microbenchmark techniques, we contrast the memory and communication performance of two generations of the HP/Convex Exemplar scalable parallel processing system. The SPP1000 and SPP2000 have significant architectural and implementation differences, but maintain upward binary compatibility. The SPP2000 employs manufacturing and packaging advances to obtain shorter system interconnects with wider data paths and improved functionality, thereby reducing the latency and increasing the bandwidth of remote communication. Although the memory latency is not significantly improved, newer out-of-order execution processors coupled with nonblocking caches achieve much higher memory bandwidth. The SPP2000 has a richer system interconnect topology that allows scalability to a larger number of processors. The SPP2000 also employs innovations in its coherence protocols to improve synchronization and communication performance. This paper characterizes the performance effects of these changes, and identifies some remaining inefficiencies, in the cache coherence protocol and the node configuration, that future systems should address.