Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Authors:
Georgios Tournavitis;Björn Franke
Affiliations:
University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom
Venue:
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Year:
2010

Citing 29
Cited 7

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Semantical interprocedural parallelization: an overview of the PIPS project

ICS '91 Proceedings of the 5th international conference on Supercomputing
HPFIT: a set of integrated tools for the parallelization of applications using High Performance Fortran. PART I: HPFIT and the TransTOOL environment

Parallel Computing - Special double issue on environment and tools for parallel scientific computing
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The parallel execution of DO loops

Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
A stream compiler for communication-exposed architectures

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Interactive Parallel Programming using the ParaScope Editor

IEEE Transactions on Parallel and Distributed Systems
Overcoming the Limitations of the Traditional Loop Parallelization

HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Standard Templates Adaptive Parallel Library (STAPL)

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A performance analysis of the Berkeley UPC compiler

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Exploiting Fine- and Coarse-grain Parallelism in Embedded Programs

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Interprocedural dependence analysis and parallelization

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Development and Implementation of an Interactive Parallelization Assistance Tool for OpenMP: iPat/OMP

IEICE - Transactions on Information and Systems
X10: concurrent programming for modern architectures

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Speculative Decoupled Software Pipelining

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Extracting coarse-grain parallelism in general-purpose programs

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Parallel-stage decoupled software pipelining

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
MAPS: an integrated framework for MPSoC application parallelization

Proceedings of the 45th annual Design Automation Conference
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems

Partool: a feedback-directed parallelizer

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Programmer-assisted automatic parallelization

Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Automatic parallelization using autofutures

MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Multi-objective aware extraction of task-level parallelism using genetic algorithms

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

In recent years multi-core computer systems have left the realm of high-performance computing and virtually all of today's desktop computers and embedded computing systems are equipped with several processing cores. Still, no single parallel programming model has found widespread support and parallel programming remains an art for the majority of application programmers. In addition, there exists a plethora of sequential legacy applications for which automatic parallelization is the only hope to benefit from the increased processing power of modern multi-core systems. In the past automatic parallelization largely focused on data parallelism. In this paper we present a novel approach to extracting and exploiting pipeline parallelism from sequential applications. We use profiling to overcome the limitations of static data and control flow analysis enabling more aggressive parallelization. Our approach is orthogonal to existing automatic parallelization approaches and additional data parallelism may be exploited in the individual pipeline stages. The key contribution of this paper is a whole-program representation that supports profiling, parallelism extraction and exploitation. We demonstrate how this enhances conventional pipeline parallelization by incorporating support for multi-level loops and pipeline stage replication in a uniform and automatic way. We have evaluated our methodology on a set of multimedia and stream processing benchmarks and demonstrate speedups of up to 4.7 on a eight-core Intel Xeon machine.