Synchronization optimizations for efficient execution on multi-cores

Authors:
Alexandru Nicolau;Guangqiang Li;Alexander V. Veidenbaum;Arun Kejariwal
Affiliations:
University of California, Irvine, Irvine, USA;University of California, Irvine, Irvine, USA;University of California, Irvine, USA;Yahoo! Inc., Santa Clara, USA
Venue:
Proceedings of the 23rd international conference on Supercomputing
Year:
2009

Citing 40
Cited 5

Efficient hardware for multiway jumps and pre-fetches

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Automatic generation of DAG parallelism

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic recognition of induction variables and recurrence relations by abstract interpretation

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Instruction reordering for fork-join parallelism

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Functional parallelism: theoretical foundations and implementation

Functional parallelism: theoretical foundations and implementation
Guarded execution and branch prediction in dynamic ILP processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Symbolic analysis for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Analysis techniques for predicated code

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Global predicate analysis and its application to register allocation

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Synchronization transformations for parallel computing

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation

Advanced compiler design and implementation
Lock coarsening: eliminating lock overhead in automatically parallelized object-based programs

Journal of Parallel and Distributed Computing
Performance limitations of the Java core libraries

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Redundant Synchronization Elimination for DOACROSS Loops

IEEE Transactions on Parallel and Distributed Systems
Classifying load and store instructions for memory renaming

ICS '99 Proceedings of the 13th international conference on Supercomputing
Removing unnecessary synchronization in Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Memory Renaming: Fast, Early and Accurate Processing of Memory Communication

International Journal of Parallel Programming
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

ACM Transactions on Computer Systems (TOCS)
Accurate and efficient predicate analysis with binary decision diagrams

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Monotonic evolution: an alternative to induction variable substitution for dependence analysis

ICS '01 Proceedings of the 15th international conference on Supercomputing
Dependence Analysis

Dependence Analysis
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Compiler Construction for Digital Computers

Compiler Construction for Digital Computers
Structure of Computers and Computations

Structure of Computers and Computations
Register Allocation, Renaming and Their Impact on Fine-Grain Parallelism

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Array SSA for Explicitly Parallel Programs

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Thin locks: featherweight Synchronization for Java

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Region array SSA

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

Proceedings of the 20th annual international conference on Supercomputing
Lightweight lock-free synchronization methods for multithreading

Proceedings of the 20th annual international conference on Supercomputing
Evaluating synchronization techniques for light-weight multithreaded/multicore architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

Proceedings of the 34th annual international symposium on Computer architecture
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
Techniques for efficient placement of synchronization primitives

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

Reducing task creation and termination overhead in explicitly parallel programs

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
How many threads to spawn during program multithreading?

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Interprocedural strength reduction of critical sections in explicitly-parallel programs

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-cores are becoming ubiquitous as exemplified by Sun's Niagra-2, Intel's Nehalem and AMD's Sau Paulo octal cores. The number of cores per chip is expected to rise in foreseeable future, as evidenced by the recently announced Intel's 80-core Teraflops Research Chip. Exploiting the parallelism of multicores necessitates concurrent software. One way to parallelize programs, not amenable to auto-parallelization, is via explicit synchronization. The placement of the synchronization primitives has a large bearing on how much thread-level parallelism (TLP) can be achieved. In this paper, we propose novel predication-based and other adjunct synchronization optimizations which facilitate exploitation on higher level of TLP than what can be achieved using the state-of-the-art. We demonstrate the efficacy of our techniques, on a real machine, using real codes, specifically, from the industry-standard SPEC CPU benchmarks and other widely used open source codes such as PostgreSQL. Our results show that the proposed techniques yield significantly higher levels of TLP than the state-of-the-art.