Efficient hardware for multiway jumps and pre-fetches
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Automatic generation of DAG parallelism
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic recognition of induction variables and recurrence relations by abstract interpretation
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Instruction reordering for fork-join parallelism
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Functional parallelism: theoretical foundations and implementation
Functional parallelism: theoretical foundations and implementation
Guarded execution and branch prediction in dynamic ILP processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Symbolic analysis for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Analysis techniques for predicated code
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Global predicate analysis and its application to register allocation
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Synchronization transformations for parallel computing
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Streamlining inter-operation memory communication via data dependence prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Lock coarsening: eliminating lock overhead in automatically parallelized object-based programs
Journal of Parallel and Distributed Computing
Performance limitations of the Java core libraries
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Redundant Synchronization Elimination for DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
Classifying load and store instructions for memory renaming
ICS '99 Proceedings of the 13th international conference on Supercomputing
Removing unnecessary synchronization in Java
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Memory Renaming: Fast, Early and Accurate Processing of Memory Communication
International Journal of Parallel Programming
ACM Transactions on Computer Systems (TOCS)
Accurate and efficient predicate analysis with binary decision diagrams
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Monotonic evolution: an alternative to induction variable substitution for dependence analysis
ICS '01 Proceedings of the 15th international conference on Supercomputing
Dependence Analysis
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiler Construction for Digital Computers
Compiler Construction for Digital Computers
Structure of Computers and Computations
Structure of Computers and Computations
Register Allocation, Renaming and Their Impact on Fine-Grain Parallelism
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Array SSA for Explicitly Parallel Programs
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Thin locks: featherweight Synchronization for Java
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 20th annual international conference on Supercomputing
Lightweight lock-free synchronization methods for multithreading
Proceedings of the 20th annual international conference on Supercomputing
Evaluating synchronization techniques for light-weight multithreaded/multicore architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 34th annual international symposium on Computer architecture
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Techniques for efficient placement of synchronization primitives
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing task creation and termination overhead in explicitly parallel programs
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
How many threads to spawn during program multithreading?
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A Transformation Framework for Optimizing Task-Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Interprocedural strength reduction of critical sections in explicitly-parallel programs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Multi-cores are becoming ubiquitous as exemplified by Sun's Niagra-2, Intel's Nehalem and AMD's Sau Paulo octal cores. The number of cores per chip is expected to rise in foreseeable future, as evidenced by the recently announced Intel's 80-core Teraflops Research Chip. Exploiting the parallelism of multicores necessitates concurrent software. One way to parallelize programs, not amenable to auto-parallelization, is via explicit synchronization. The placement of the synchronization primitives has a large bearing on how much thread-level parallelism (TLP) can be achieved. In this paper, we propose novel predication-based and other adjunct synchronization optimizations which facilitate exploitation on higher level of TLP than what can be achieved using the state-of-the-art. We demonstrate the efficacy of our techniques, on a real machine, using real codes, specifically, from the industry-standard SPEC CPU benchmarks and other widely used open source codes such as PostgreSQL. Our results show that the proposed techniques yield significantly higher levels of TLP than the state-of-the-art.