An experimental evaluation of the HP V-class and SGI origin 2000 multiprocessors using microbenchmarks and scientific applications

  • Authors:
  • Ravi Iyer;Jack Perdue;Lawrence Rauchwerger;Nancy M. Amato;Laxmi Bhuyan

  • Affiliations:
  • Intel Corporation;Parasol Laboratory, Department of Computer Science, Texas A&M University College Station, TX;Parasol Laboratory, Department of Computer Science, Texas A&M University College Station, TX;Parasol Laboratory, Department of Computer Science, Texas A&M University College Station, TX;Department of Computer Science and Engineering, University of California Riverside, Riverside, CA

  • Venue:
  • International Journal of Parallel Programming
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

As processor technology continues to advance at a rapid pace, the principal performance bottleneck of shared memory systems has become the memory access latency. In order to understand the effects of cache and memory hierarchy on system latencies, performance analysts perform benchmark analysis on existing multiprocessors. In this study, we present a detailed comparison of two architectures, the HP V-Class and the SGI Origin 2000. Our goal is to compare and contrast design techniques used in these multiprocessors. We present the impact of processor design, cache/memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors. We also study the effect of parallelism overheads such as process creation and synchronization on the user-level performance of these multiprocessors. Our experimental methodology uses microbenchmarks as well as scientific applications to characterize the user-level performance. Our microbenchmark results show the impact of L1/L2 cache size and TLB size on uniprocessor load/store latencies, the effect of coherence protocol design/optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism. Our application-based evaluation shows the impact of problem size, dominant sharing patterns and number of processors used on speedup and raw execution time. Finally, we use hardware counter measurements to study the correlation of system-level performance metrics and the application's execution time performance.