A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Authors:
William Thies;Vikram Chandrasekhar;Saman Amarasinghe
Affiliations:
-;-;-
Venue:
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2007

Citing 0
Cited 51

MAPS: an integrated framework for MPSoC application parallelization

Proceedings of the 45th annual Design Automation Conference
SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
MPSoC Design Using Application-Specific Architecturally Visible Communication

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Extracting Coarse-Grained Pipelined Parallelism Out of Sequential Applications for Parallel Processor Arrays

ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Profiling Java programs for parallelism

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Low-power inter-core communication through cache partitioning in embedded multiprocessors

Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
Searching for Concurrent Design Patterns in Video Games

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Input-driven dynamic execution prediction of streaming applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Speculative parallelization of sequential loops on multicores

International Journal of Parallel Programming
A profile-based tool for finding pipeline parallelism in sequential programs

Parallel Computing
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Lime: a Java-compatible and synthesizable language for heterogeneous architectures

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency

ACM Transactions on Design Automation of Electronic Systems (TODAES)
RMOT: recursion in model order for task execution time estimation in a software pipeline

Proceedings of the Conference on Design, Automation and Test in Europe
Resource recycling: putting idle resources to work on a composable accelerator

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Exposing tunable parameters in multi-threaded numerical code

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Speculative Parallelization on Commodity Clusters

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ALTER: exploiting breakable dependences for parallelization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parallel pattern detection for architectural improvements

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Safe parallel programming using dynamic dependence hints

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Extending synchronization constructs in openMP to exploit pipeline parallelism on heterogeneous multi-core

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Programmer-assisted automatic parallelization

Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Expressing pipeline parallelism using TBB constructs: a case study on what works and what doesn't

Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Single thread program parallelism with dataflow abstracting thread

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Characteristics of workloads using the pipeline programming model

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Fast loop-level data dependence profiling

Proceedings of the 26th ACM international conference on Supercomputing
Multi-slicing: a compiler-supported parallel approach to data dependence profiling

Proceedings of the 2012 International Symposium on Software Testing and Analysis
Automatic generation of software pipelines for heterogeneous parallel systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Sigma*: symbolic learning of input-output specifications

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
General data structure expansion for multi-threading

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Pipelets: self-organizing software pipelines for many-core architectures

Proceedings of the Conference on Design, Automation and Test in Europe
Computational caches

Proceedings of the 6th International Systems and Storage Conference
Runtime resource allocation for software pipelines

Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
On-the-fly pipeline parallelism

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Optimizations for configuring and mapping software pipelines in many core systems

Proceedings of the 50th Annual Design Automation Conference
A catalog of stream processing optimizations

ACM Computing Surveys (CSUR)
An automatic thread decomposition approach for pipelined multithreading

International Journal of High Performance Computing and Networking
Multi-objective exploitation of pipeline parallelism using clustering, replication and duplication in embedded multi-core systems

Journal of Systems Architecture: the EUROMICRO Journal
HEAP: A Highly Efficient Adaptive multi-Processor framework

Microprocessors & Microsystems
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)
Accelerating sequential programs on commodity multi-core processors

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The emergence of multicore processors has heightened the need for effective parallel programming practices. In addition to writing new parallel programs, the next gener- ation of programmers will be faced with the overwhelming task of migrating decades' worth of legacy C code into a parallel representation. Addressing this problem requires a toolset of parallel programming primitives that can broadly apply to both new and existing programs. While tools such as threads and OpenMP allow programmers to express task and data parallelism, support for pipeline parallelism is distinctly lacking. In this paper, we offer a new and pragmatic approach to leveraging coarse-grained pipeline parallelism in C pro- grams. We target the domain of streaming applications, such as audio, video, and digital signal processing, which exhibit regular flows of data. To exploit pipeline paral- lelism, we equip the programmer with a simple set of an- notations (indicating pipeline boundaries) and a dynamic analysis that tracks all communication across those bound- aries. Our analysis outputs a stream graph of the applica- tion as well as a set of macros for parallelizing the program and communicating the data needed. We apply our method- ology to six case studies, including MPEG-2 decoding, MP3 decoding, GMTI radar processing, and three SPEC bench- marks. Our analysis extracts a useful block diagram for each application, and the parallelized versions offer a 2.78x mean speedup on a 4-core machine.