Predicting Performance on SMPs. A Case Study: The SGI Power Challenge

Authors:
Jack Perdue
Affiliations:
-
Venue:
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Year:
2000

Citing 0
Cited 8

Computational power of pipelined memory hierarchies

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Optimal organizations for pipelined hierarchical memories

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Distribution Sweeping on Clustered Machines with Hierarchical Memories

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
SmartApps: An Application Centric Approach to High Performance Computing

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
An experimental evaluation of the HP V-class and SGI origin 2000 multiprocessors using microbenchmarks and scientific applications

International Journal of Parallel Programming
On approximating the ideal random access machine by physical machines

Journal of the ACM (JACM)
STAPL: an adaptive, generic parallel C++ library

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the issue of performance prediction on the SGI-Power Challenge, a typical SMP. On such a platform, the cost of memory accesses depends on their locality and on contention among processors. By running a carefully designed suite of microbenchmarks, we provide quantitative evidence that memory hierarchy effects impact performance far more substantially than other phenomena related to contention. We also fit three cost functions based on variants of the BSP model, which do not account for the hierarchy, and a newly defined function F, expressed in terms of hardware counters, which captures both memory hierarchy and contention effects. We test the accuracy of all the functions on both synthetic and application benchmarks showing that, unlike the other functions, F achieves an excellent level of accuracy in all cases. Although hardware counters are only available at run-time, we give evidence that function F can still be employed as a prediction tool by extrapolating values of the counters from pilot runs on small input sizes.