Interprocedural strength reduction of critical sections in explicitly-parallel programs

Authors:
Rajkishore Barik;Jisheng Zhao;Vivek Sarkar
Affiliations:
Intel, Santa Clara, CA, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA
Venue:
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Year:
2013

Citing 36
Cited 1

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Concurrency analysis in the presence of procedures using a data-flow framework

TAV4 Proceedings of the symposium on Testing, analysis, and verification
Scalar replacement in the presence of conditional control flow

Software—Practice & Experience
Register promotion in C programs

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Register promotion by sparse partial redundancy elimination of loads and stores

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A conservative data flow algorithm for detecting all pairs of statements that may happen in parallel

SIGSOFT '98/FSE-6 Proceedings of the 6th ACM SIGSOFT international symposium on Foundations of software engineering
Load-reuse analysis: design and evaluation

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Escape analysis for Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Effective synchronization removal for Java

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Location Consistency-A New Memory Model and Cache Consistency Protocol

IEEE Transactions on Computers
Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Static Analyses for Eliminating Unnecessary Synchronization from Java Programs

SAS '99 Proceedings of the 6th International Symposium on Static Analysis
Unified Analysis of Array and Object References in Strongly Typed Languages

SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
An efficient static analysis algorithm to detect redundant memory operations

Proceedings of the 2002 workshop on Memory system performance
Static conflict analysis for multi-threaded object-oriented programs

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
May-happen-in-parallel analysis of X10 programs

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Goldilocks: a race and transaction-aware java runtime

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Statistically rigorous java performance evaluation

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Static Detection of Place Locality and Elimination of Runtime Checks

APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Synchronization optimizations for efficient execution on multi-cores

Proceedings of the 23rd international conference on Supercomputing
Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
A type and effect system for deterministic parallel Java

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cooperative reasoning for preemptive execution

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Transactional locking II

DISC'06 Proceedings of the 20th international conference on Distributed Computing
Using inter-procedural side-effect information in JIT optimizations

CC'05 Proceedings of the 14th international conference on Compiler Construction
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Isolation for nested task parallelism

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce novel compiler optimization techniques to reduce the number of operations performed in critical sections that occur in explicitly-parallel programs. Specifically, we focus on three code transformations: 1) Partial Strength Reduction (PSR) of critical sections to replace critical sections by non-critical sections on certain control flow paths; 2) Critical Load Elimination (CLE) to replace memory accesses within a critical section by accesses to scalar temporaries that contain values loaded outside the critical section; and 3) Non-critical Code Motion (NCM) to hoist thread-local computations out of critical sections. The effectiveness of the first two transformations is further increased by interprocedural analysis. The effectiveness of our techniques has been demonstrated for critical section constructs from three different explicitly-parallel programming models --- the isolated construct in Habanero Java (HJ), the synchronized construct in standard Java, and transactions in the Java-based Deuce software transactional memory system. We used two SMP platforms (a 16-core Intel Xeon SMP and a 32-Core IBM Power7 SMP) to evaluate our optimizations on 17 explicitly-parallel benchmark programs that span all three models. Our results show that the optimizations introduced in this paper can deliver measurable performance improvements that increase in magnitude when the program is run with a larger number of processor cores. These results underscore the importance of optimizing critical sections, and the fact that the benefits from such optimizations will continue to increase with increasing numbers of cores in future many-core processors.