Scal-Tool: pinpointing and quantifying scalability bottlenecks in DSM multiprocessors
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Hi-index | 0.00 |
Proposes an analytical model that quantifies the overall execution time of a parallel region in the presence of non-deterministic load imbalance introduced by network contention and by a random replacement policy in processor caches. We present a novel model that evaluates the expected hit ratio and variance introduced by a cache accessed with a cyclic access stream. We also model the performance improvement of fuzzy barriers, where the synchronization between processors at the end of a parallel region is relaxed. Experiments on a 64-processor KSR (Kendall Square Research) system which has random first-level caches confirms the general nature of the analytic results.