Measurement-based characterization of global memory and network contention, operating system and parallelization overheads

Authors:
C. Natarajan;S. Sharma;R. K. Iyer
Affiliations:
Center for Reliable and High Performance Computing, Univeristy of Illinois at Urbana-Champaign, 1308 W. Main Street, Urbana, IL;Center for Reliable and High Performance Computing, Univeristy of Illinois at Urbana-Champaign, 1308 W. Main Street, Urbana, IL;Center for Reliable and High Performance Computing, Univeristy of Illinois at Urbana-Champaign, 1308 W. Main Street, Urbana, IL
Venue:
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Year:
1994

Citing 9
Cited 6

Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
Measuring VAX 8800 performance with a histogram hardware monitor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Behavioral characterization of multiprocessor memory systems: a case study

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Characterizing the caching and synchronization performance of a multiprocessor operating system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The cedar system and an initial performance study

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Vector performance analysis of the NEC SX-2

ICS '90 Proceedings of the 4th international conference on Supercomputing
Cache Performance in the VAX-11/780

ACM Transactions on Computer Systems (TOCS)
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems
Machine Characterization and Benchmark Performance Prediction

Machine Characterization and Benchmark Performance Prediction

Predicting application behavior in large scale shared-memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Evaluating the impact of advanced memory systems on compiler-parallelized codes

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Characterizing the Memory Behavior of Compiler-Parallelized Applications

IEEE Transactions on Parallel and Distributed Systems
On the value of preemption in scheduling

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Non-preemptive speed scaling

SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Non-preemptive speed scaling

Journal of Scheduling

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study presents a characterization of (1) the global memory and interconnection network contention overhead, (2) the operating system overheads, and (3) the runtime system parallelization overheads for the Cedar shared-memory multiprocessor. The measurements were obtained using five representative compute-intensive, scientific, loop parallel applications from the Perfect Benchmark Suite. The overheads were measured for a range of Cedar configurations from 1 processor to the full 4-cluster/32-processor configuration, thus characterizing the effect of this scaling on the overheads. For the full 4-cluster Cedar, the operating system overhead was found to constitute 5--21% of the total completion time of an application. The parallelization overhead accounts for 10--25% of the application completion time, and the overhead due to global memory and network contention contributes 8--21% of the application completion time.