The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A dynamic optimization framework for a Java just-in-time compiler
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Doany: Not Just Another Parallel Loop
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Making context-sensitive points-to analysis with heap cloning practical for the real world
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Embla - Data Dependence Profiling for Parallel Programming
CISIS '08 Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems
Profiling Java programs for parallelism
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Time-Based Software Transactional Memory
IEEE Transactions on Parallel and Distributed Systems
Critical-Path-Guided Interactive Parallelisation
ICPPW '11 Proceedings of the 2011 40th International Conference on Parallel Processing Workshops
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Sambamba: a runtime system for online adaptive parallelization
CC'12 Proceedings of the 21st international conference on Compiler Construction
Hi-index | 0.00 |
How can we exploit a microprocessor as efficiently as possible? The "classic" approach is static optimization at compile-time, conservatively optimizing a program while keeping all possible uses in mind. Further optimization can only be achieved by anticipating the actual usage profile: If we know, for instance, that two computations will be independent, we can run them in parallel. However, brute force parallelization may slow down execution due to its large overhead. But as this depends on runtime features, such as structure and size of input data, parallel execution needs to dynamically adapt to the runtime situation at hand. Our SAMBAMBA framework implements such a dynamic adaptation for regular sequential C programs through adaptive dispatch between sequential and parallel function instances. In an evaluation of 14 programs, we show that automatic parallelization in combination with adaptive dispatch can lead to speed-ups of up to 5.2 fold on a quad-core machine with hyperthreading. At this point, we rely on programmer annotations but will get rid of this requirement as the platform evolves to support efficient speculative optimizations.