Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
Measuring VAX 8800 performance with a histogram hardware monitor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Behavioral characterization of multiprocessor memory systems: a case study
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The cedar system and an initial performance study
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Vector performance analysis of the NEC SX-2
ICS '90 Proceedings of the 4th international conference on Supercomputing
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Machine Characterization and Benchmark Performance Prediction
Machine Characterization and Benchmark Performance Prediction
Predicting application behavior in large scale shared-memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Evaluating the impact of advanced memory systems on compiler-parallelized codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Characterizing the Memory Behavior of Compiler-Parallelized Applications
IEEE Transactions on Parallel and Distributed Systems
On the value of preemption in scheduling
APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Journal of Scheduling
Hi-index | 0.00 |
This study presents a characterization of (1) the global memory and interconnection network contention overhead, (2) the operating system overheads, and (3) the runtime system parallelization overheads for the Cedar shared-memory multiprocessor. The measurements were obtained using five representative compute-intensive, scientific, loop parallel applications from the Perfect Benchmark Suite. The overheads were measured for a range of Cedar configurations from 1 processor to the full 4-cluster/32-processor configuration, thus characterizing the effect of this scaling on the overheads. For the full 4-cluster Cedar, the operating system overhead was found to constitute 5--21% of the total completion time of an application. The parallelization overhead accounts for 10--25% of the application completion time, and the overhead due to global memory and network contention contributes 8--21% of the application completion time.