An experimental evaluation of the HP V-class and SGI origin 2000 multiprocessors using microbenchmarks and scientific applications

Authors:
Ravi Iyer;Jack Perdue;Lawrence Rauchwerger;Nancy M. Amato;Laxmi Bhuyan
Affiliations:
Intel Corporation;Parasol Laboratory, Department of Computer Science, Texas A&M University College Station, TX;Parasol Laboratory, Department of Computer Science, Texas A&M University College Station, TX;Parasol Laboratory, Department of Computer Science, Texas A&M University College Station, TX;Department of Computer Science and Engineering, University of California Riverside, Riverside, CA
Venue:
International Journal of Parallel Programming
Year:
2005

Citing 14
Cited 1

A bridging model for parallel computation

Communications of the ACM
The Stanford Dash Multiprocessor

Computer
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
LogP: a practical model of parallel computation

Communications of the ACM
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Characterizing Distributed Shared Memory Performance: A Case Study of the Convex SPP1000

IEEE Transactions on Parallel and Distributed Systems
Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance

Proceedings of the 25th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

A virtual environment for complex products collaborative assembly operation simulation

Journal of Intelligent Manufacturing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, performance analysts perform benchmark analysis on existing multiprocessors. In this study, we present a detailed comparison of two architectures, the HP V-Class and the SGI Origin 2000. Our goal is to compare and contrast design techniques used in these multiprocessors. We present the impact of processor design, cache/memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors. We also study the effect of parallelism overheads such as process creation and synchronization on the user-level performance of these multiprocessors. Our experimental methodology uses microbenchmarks as well as scientific applications to characterize the user-level performance. Our microbenchmark results show the impact of L1/L2 cache size and TLB size on uniprocessor load/store latencies, the effect of coherence protocol design/optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism. Our application-based evaluation shows the impact of problem size, dominant sharing patterns and number of processors used on speedup and raw execution time. Finally, we use hardware counter measurements to study the correlation of system-level performance metrics and the application's execution time performance.