Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
On-the-fly detection of data races for programs with nested fork-join parallelism
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Proceedings of the ACM 2000 conference on Java Grande
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Limits of Task-Based Parallelism in Irregular Applications
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Embla - Data Dependence Profiling for Parallel Programming
CISIS '08 Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems
Intel threading building blocks
Intel threading building blocks
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Alchemist: A Transparent Dependence Distance Profiling Infrastructure
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
Quantifying the potential task-based dataflow parallelism in MPI applications
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
A model-driven approach for software parallelization
MODELS'11 Proceedings of the 2011th international conference on Models in Software Engineering
Fast loop-level data dependence profiling
Proceedings of the 26th ACM international conference on Supercomputing
Multi-slicing: a compiler-supported parallel approach to data dependence profiling
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Hi-index | 0.03 |
Manual parallelization of programs is known to be difficult and error-prone, and there are currently few ways to measure the amount of potential parallelism in the original sequential code. We present an extension of Embla, a Valgrind-based dependence profiler that links dynamic dependences back to source code. This new tool estimates potential task-level parallelism in a sequential program and helps programmers exploit it at the source level. Using the popular forkjoin model, our tool provides a realistic estimate of potential speed-up for parallelization with frameworks like Cilk, TBB or OpenMP 3.0. Estimates can be given for several different parallelization models, varying in programmer effort and capabilities required of the underlying implementation. Our tool also outputs source-level dependence information to aid the parallelization of programs with lots of inherent parallelism, as well as critical paths to suggest algorithmic rewrites of programs with little of it. We validate our claims by running our tool over serial elisions of sample Cilk programs, finding additional inherent parallelism not exploited by the Cilk code, as well as over serial C benchmarks where the profiling results suggest parallelism-enhancing algorithmic rewrites.