The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Semantical interprocedural parallelization: an overview of the PIPS project
ICS '91 Proceedings of the 5th international conference on Supercomputing
Parallel Computing - Special double issue on environment and tools for parallel scientific computing
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The parallel execution of DO loops
Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Interactive Parallel Programming using the ParaScope Editor
IEEE Transactions on Parallel and Distributed Systems
Overcoming the Limitations of the Traditional Loop Parallelization
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Standard Templates Adaptive Parallel Library (STAPL)
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Exploiting Fine- and Coarse-grain Parallelism in Embedded Programs
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Interprocedural dependence analysis and parallelization
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
IEICE - Transactions on Information and Systems
X10: concurrent programming for modern architectures
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Speculative Decoupled Software Pipelining
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Extracting coarse-grain parallelism in general-purpose programs
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
MAPS: an integrated framework for MPSoC application parallelization
Proceedings of the 45th annual Design Automation Conference
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Speculative parallelization using software multi-threaded transactions
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Partool: a feedback-directed parallelizer
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Programmer-assisted automatic parallelization
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Automatic parallelization using autofutures
MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Multi-objective aware extraction of task-level parallelism using genetic algorithms
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
In recent years multi-core computer systems have left the realm of high-performance computing and virtually all of today's desktop computers and embedded computing systems are equipped with several processing cores. Still, no single parallel programming model has found widespread support and parallel programming remains an art for the majority of application programmers. In addition, there exists a plethora of sequential legacy applications for which automatic parallelization is the only hope to benefit from the increased processing power of modern multi-core systems. In the past automatic parallelization largely focused on data parallelism. In this paper we present a novel approach to extracting and exploiting pipeline parallelism from sequential applications. We use profiling to overcome the limitations of static data and control flow analysis enabling more aggressive parallelization. Our approach is orthogonal to existing automatic parallelization approaches and additional data parallelism may be exploited in the individual pipeline stages. The key contribution of this paper is a whole-program representation that supports profiling, parallelism extraction and exploitation. We demonstrate how this enhances conventional pipeline parallelization by incorporating support for multi-level loops and pipeline stage replication in a uniform and automatic way. We have evaluated our methodology on a set of multimedia and stream processing benchmarks and demonstrate speedups of up to 4.7 on a eight-core Intel Xeon machine.