Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms

Authors:
Daniel Cordes;Michael Engel;Olaf Neugebauer;Peter Marwedel
Affiliations:
TU Dortmund University, Dortmund, Germany;TU Dortmund University, Dortmund, Germany;TU Dortmund University, Dortmund, Germany;TU Dortmund University, Dortmund, Germany
Venue:
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Year:
2013

Citing 19
Cited 0

Automatic partitioning of a program dependence graph into parallel tasks

IBM Journal of Research and Development
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Data distribution support on distributed shared memory multiprocessors

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Loop Parallelization in the Polytope Model

CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Compiler parallelization of C programs for multi-core DSPs with multiple address spaces

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Parallel-stage decoupled software pipelining

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Daedalus: toward composable multimedia MP-SoC design

Proceedings of the 45th annual Design Automation Conference
MAPS: an integrated framework for MPSoC application parallelization

Proceedings of the 45th annual Design Automation Conference
Optimal loop parallelization for maximizing iteration-level parallelism

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Versatile system-level memory-aware platform description approach for embedded MPSoCs

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Automatic Extraction of Pipeline Parallelism for Embedded Software Using Linear Programming

ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic parallelization of sequential applications is the key for efficient use and optimization of current and future embedded multi-core systems. However, existing approaches often fail to achieve efficient balancing of tasks running on heterogeneous cores of an MPSoC. A reason for this is often insufficient knowledge of the underlying architecture's performance. In this paper, we present a novel parallelization approach for embedded MPSoCs that combines pipeline parallelization for loops with knowledge about different execution times for tasks on cores with different performance properties. Using Integer Linear Programming, an optimal solution with respect to the model used is derived implementing tasks with a well-balanced execution behavior. We evaluate our pipeline parallelization approach for heterogeneous MPSoCs using a set of standard embedded benchmarks and compare it with two existing state-of-the-art approaches. For all benchmarks, our parallelization approach obtains significantly higher speedups than either approach on heterogeneous MPSoCs.