The hierarchical task graph and its use in auto-scheduling
ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
The hierarchical task graph as a universal intermediate representation
International Journal of Parallel Programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Pthreads programming
Data distribution support on distributed shared memory multiprocessors
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Advanced compiler design and implementation
Advanced compiler design and implementation
Proceedings of the 39th annual Design Automation Conference
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Compiler parallelization of C programs for multi-core DSPs with multiple address spaces
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
MPARM: Exploring the Multi-Processor SoC Design Space with SystemC
Journal of VLSI Signal Processing Systems
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
pn: a tool for improved derivation of process networks
EURASIP Journal on Embedded Systems
MAPS: an integrated framework for MPSoC application parallelization
Proceedings of the 45th annual Design Automation Conference
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Versatile system-level memory-aware platform description approach for embedded MPSoCs
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Exploring parallelizations of applications for MPSoC platforms using MPA
Proceedings of the Conference on Design, Automation and Test in Europe
MNEMEE: a framework for memory management and optimization of static and dynamic data in MPSoCs
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A pattern-supported parallelization approach
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Designer-in-the-loop recoding of ESL models using static parallel access conflict analysis
Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
Multi-objective aware extraction of task-level parallelism using genetic algorithms
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Compiling Scilab to high performance embedded multicore systems
Microprocessors & Microsystems
Hi-index | 0.00 |
The last years have shown that there is no way to disregard the advantages provided by multiprocessor System-on-Chip (MPSoC) architectures in the embedded systems domain. Using multiple cores in a single system enables to close the gap between energy consumption, problems concerning heat dissipation, and computational power. Nevertheless, these benefits do not come for free. New challenges arise, if existing applications have to be ported to these multiprocessor platforms. One of the most ambitious tasks is to extract efficient parallelism from these existing sequential applications. Hence, many parallelization tools have been developed, most of them are extracting as much parallelism as possible, which is in general not the best choice for embedded systems with their limitations in hardware and software support. In contrast to previous approaches, we present a new automatic parallelization tool, tailored to the particular requirements of the resource constrained embedded systems. Therefore, this paper presents an algorithm which automatically steers the granularity of the generated tasks, with respect to architectural requirements and the overall execution time reduction. For this purpose, we exploit hierarchical task graphs to simplify a new integer linear programming based approach in order to split up sequential programs in an efficient way. Results on real-life benchmarks have shown that the presented approach is able to speed sequential applications up by a factor of up to 3.7 on a four core MPSoC architecture.