Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Data distribution support on distributed shared memory multiprocessors
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Compiler parallelization of C programs for multi-core DSPs with multiple address spaces
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
MPARM: Exploring the Multi-Processor SoC Design Space with SystemC
Journal of VLSI Signal Processing Systems
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Fast, Efficient and Predictable Memory Accesses: Optimization Algorithms for Memory Architecture Aware Compilation
pn: a tool for improved derivation of process networks
EURASIP Journal on Embedded Systems
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Daedalus: toward composable multimedia MP-SoC design
Proceedings of the 45th annual Design Automation Conference
MAPS: an integrated framework for MPSoC application parallelization
Proceedings of the 45th annual Design Automation Conference
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Optimal loop parallelization for maximizing iteration-level parallelism
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
On the Interplay of Parallelization, Program Performance, and Energy Consumption
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Automatic Extraction of Pipeline Parallelism for Embedded Software Using Linear Programming
ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
Hi-index | 0.00 |
The development of automatic parallelization techniques has been fascinating researchers for decades. This has resulted in a significant amount of tools, which should relieve the designer from the burden of manually parallelizing an application. However, most of these tools only focus on minimizing execution time which drastically reduces their applicability to embedded devices. It is essential to find good trade-offs between different objectives like, e.g., execution time, energy consumption, or communication overhead, if applications should be parallelized for embedded multiprocessor system-on-chip (MPSoC) devices. Another important aspect which has to be taken into account is the streaming-based structure found in many embedded applications such as multimedia and network services. The best way to parallelize these applications is to extract pipeline parallelism. Therefore, this paper presents the first multi-objective aware approach exploiting pipeline parallelism automatically to make it most suitable for resource-restricted embedded devices. We have compared the new pipeline parallelization approach to an existing task-level extraction technique. The evaluation has shown that the new approach extracts very efficient multi-objective aware parallelism. In addition, the two approaches have been combined and it could be shown that both approaches perfectly complement each other.