Efficient hardware for multiway jumps and pre-fetches
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Compiler algorithms for synchronization
IEEE Transactions on Computers
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Automatic generation of DAG parallelism
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Introduction to algorithms
Selected papers of the second workshop on Languages and compilers for parallel computing
Instruction reordering for fork-join parallelism
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Functional parallelism: theoretical foundations and implementation
Functional parallelism: theoretical foundations and implementation
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Optimal code motion: theory and practice
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A hierarchical approach to instruction-level parallelization
International Journal of Parallel Programming
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Advanced compiler design and implementation
Advanced compiler design and implementation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compositional pointer and escape analysis for Java programs
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Global optimization by suppression of partial redundancies
Communications of the ACM
Pointer and escape analysis for multithreaded programs
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Dependence Analysis
Structure of Computers and Computations
Structure of Computers and Computations
GTS: Extracting Full Parallelism Out of DO Loops
PARLE '89 Proceedings of the Parallel Architectures and Languages Europe, Volume II: Parallel Languages
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Fast Synchronization on Scalable Cache-Coherent Multiprocessors using Hybrid Primitives
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Percolation Scheduling: A Parallel Compilation Technique
Percolation Scheduling: A Parallel Compilation Technique
Mechanisms for efficient shared-memory, lock-based synchronization
Mechanisms for efficient shared-memory, lock-based synchronization
Thin locks: featherweight Synchronization for Java
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Proceedings of the 20th annual international conference on Supercomputing
Lightweight lock-free synchronization methods for multithreading
Proceedings of the 20th annual international conference on Supercomputing
Evaluating synchronization techniques for light-weight multithreaded/multicore architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 34th annual international symposium on Computer architecture
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Dynamic recognition of synchronization operations for improved data race detection
ISSTA '08 Proceedings of the 2008 international symposium on Software testing and analysis
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
How many threads to spawn during program multithreading?
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Harnessing the hardware parallelism of the emerging multi-cores systems necessitates concurrent software. Unfortunately, most of the existing mainstream software is sequential in nature. Although one could auto-parallelize a given program, the efficacy of this is largely limited to floating-point codes. One of the ways to alleviate the above limitation is to parallelize programs, which cannot be auto-parallelized, via explicit synchronization. In this regard, efficient placement of the synchronization primitives - say, post, wait - plays a key role in achieving high degree of thread-level parallelism (TLP). In this paper, we propose novel compiler techniques for the above. Specifically, given a control flow graph (CFG), the proposed techniques place a post as early as possible and place a wait as late as possible in the CFG, subject to dependences. We demonstrate the efficacy of our techniques, on a real machine, using real codes, specifically, from the industry-standard SPEC CPU benchmarks, the Linux kernel and other widely used open source codes. Our results show that the proposed techniques yield significantly higher levels of TLP than the state-of-the-art.