Synchronization minimization in a SPMD execution model
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The Omega Library interface guide
The Omega Library interface guide
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems (TOPLAS)
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
An Exact Method for Analysis of Value-based Array Data Dependences
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
Interprocessor synchronization, while extremely important for ensuring execution correctness, can be very costly in terms of both power and performance overheads. Unfortunately, many parallelizing compilers are very conservative in inserting barrier synchronizations at the end of each and every parallel loop. This can lead to significant power consumption in chip multiprocessor based execution environments. This paper proposes a compiler-directed approach for eliminating such synchronization calls between neighboring parallel loops. It achieves its goal by partitioning loop iterations across processors such that each processor executes iterations from both the loops that access the same set of array elements. We implemented the proposed approach using an experimental compilation framework and made experiments with ten SPEC benchmark codes. Our experiments clearly show that the proposed compiler-directed approach is very effective and reduces energy overheads due to synchronizations by about 75.5%, and this corresponds to around 5.48% saving on average in overall energy consumption.