Computational power of pipelined memory hierarchies
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Optimal organizations for pipelined hierarchical memories
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Distribution Sweeping on Clustered Machines with Hierarchical Memories
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
SmartApps: An Application Centric Approach to High Performance Computing
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
International Journal of Parallel Programming
On approximating the ideal random access machine by physical machines
Journal of the ACM (JACM)
STAPL: an adaptive, generic parallel C++ library
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Hi-index | 0.00 |
We study the issue of performance prediction on the SGI-Power Challenge, a typical SMP. On such a platform, the cost of memory accesses depends on their locality and on contention among processors. By running a carefully designed suite of microbenchmarks, we provide quantitative evidence that memory hierarchy effects impact performance far more substantially than other phenomena related to contention. We also fit three cost functions based on variants of the BSP model, which do not account for the hierarchy, and a newly defined function F, expressed in terms of hardware counters, which captures both memory hierarchy and contention effects. We test the accuracy of all the functions on both synthetic and application benchmarks showing that, unlike the other functions, F achieves an excellent level of accuracy in all cases. Although hardware counters are only available at run-time, we give evidence that function F can still be employed as a prediction tool by extrapolating values of the counters from pilot runs on small input sizes.