Reducing power through compiler-directed barrier synchronization elimination

Authors:
Mahmut Kandemir;Seung Woo Son
Affiliations:
Pennsylvania State University, University Park, PA;Pennsylvania State University, University Park, PA
Venue:
Proceedings of the 2006 international symposium on Low power electronics and design
Year:
2006

Citing 13
Cited 0

Synchronization minimization in a SPMD execution model

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The Omega Library interface guide

The Omega Library interface guide
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Array SSA form and its use in parallelization

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A global communication optimization technique based on data-flow analysis and linear algebra

ACM Transactions on Programming Languages and Systems (TOPLAS)
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Simics: A Full System Simulation Platform

Computer
An Exact Method for Analysis of Value-based Array Data Dependences

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Exploiting Barriers to Optimize Power Consumption of CMPs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interprocessor synchronization, while extremely important for ensuring execution correctness, can be very costly in terms of both power and performance overheads. Unfortunately, many parallelizing compilers are very conservative in inserting barrier synchronizations at the end of each and every parallel loop. This can lead to significant power consumption in chip multiprocessor based execution environments. This paper proposes a compiler-directed approach for eliminating such synchronization calls between neighboring parallel loops. It achieves its goal by partitioning loop iterations across processors such that each processor executes iterations from both the loops that access the same set of array elements. We implemented the proposed approach using an experimental compilation framework and made experiments with ten SPEC benchmark codes. Our experiments clearly show that the proposed compiler-directed approach is very effective and reduces energy overheads due to synchronizations by about 75.5%, and this corresponds to around 5.48% saving on average in overall energy consumption.