Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Concurrency analysis in the presence of procedures using a data-flow framework
TAV4 Proceedings of the symposium on Testing, analysis, and verification
Scalar replacement in the presence of conditional control flow
Software—Practice & Experience
Register promotion in C programs
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Register promotion by sparse partial redundancy elimination of loads and stores
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A conservative data flow algorithm for detecting all pairs of statements that may happen in parallel
SIGSOFT '98/FSE-6 Proceedings of the 6th ACM SIGSOFT international symposium on Foundations of software engineering
Load-reuse analysis: design and evaluation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Effective synchronization removal for Java
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Location Consistency-A New Memory Model and Cache Consistency Protocol
IEEE Transactions on Computers
Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Static Analyses for Eliminating Unnecessary Synchronization from Java Programs
SAS '99 Proceedings of the 6th International Symposium on Static Analysis
Unified Analysis of Array and Object References in Strongly Typed Languages
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
An efficient static analysis algorithm to detect redundant memory operations
Proceedings of the 2002 workshop on Memory system performance
Static conflict analysis for multi-threaded object-oriented programs
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The DaCapo benchmarks: java benchmarking development and analysis
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
May-happen-in-parallel analysis of X10 programs
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Goldilocks: a race and transaction-aware java runtime
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Statistically rigorous java performance evaluation
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Static Detection of Place Locality and Elimination of Runtime Checks
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
A type and effect system for deterministic parallel Java
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Data marshaling for multi-core architectures
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cooperative reasoning for preemptive execution
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A case for an SC-preserving compiler
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Using inter-procedural side-effect information in JIT optimizations
CC'05 Proceedings of the 14th international conference on Compiler Construction
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Isolation for nested task parallelism
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Hi-index | 0.00 |
In this paper, we introduce novel compiler optimization techniques to reduce the number of operations performed in critical sections that occur in explicitly-parallel programs. Specifically, we focus on three code transformations: 1) Partial Strength Reduction (PSR) of critical sections to replace critical sections by non-critical sections on certain control flow paths; 2) Critical Load Elimination (CLE) to replace memory accesses within a critical section by accesses to scalar temporaries that contain values loaded outside the critical section; and 3) Non-critical Code Motion (NCM) to hoist thread-local computations out of critical sections. The effectiveness of the first two transformations is further increased by interprocedural analysis. The effectiveness of our techniques has been demonstrated for critical section constructs from three different explicitly-parallel programming models --- the isolated construct in Habanero Java (HJ), the synchronized construct in standard Java, and transactions in the Java-based Deuce software transactional memory system. We used two SMP platforms (a 16-core Intel Xeon SMP and a 32-Core IBM Power7 SMP) to evaluate our optimizations on 17 explicitly-parallel benchmark programs that span all three models. Our results show that the optimizations introduced in this paper can deliver measurable performance improvements that increase in magnitude when the program is run with a larger number of processor cores. These results underscore the importance of optimizing critical sections, and the fact that the benefits from such optimizations will continue to increase with increasing numbers of cores in future many-core processors.