Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Compaan: deriving process networks from Matlab for embedded signal processing architectures
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Scheduling and Automatic Parallelization
Scheduling and Automatic Parallelization
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
System Design Using Kahn Process Networks: The Compaan/Laura Approach
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Scalable and structured scheduling
International Journal of Parallel Programming
A GPGPU compiler for memory optimization and parallelism management
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
A data parallel view on polyhedral process networks
Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Hi-index | 0.00 |
With advances in manycore and accelerator architectures, the high performance and embedded spaces are rapidly converging. Emerging architectures feature different forms of parallelism. The Polyhedral Processes Networks (PPNs) are a proven model of choice for automated generation of pipeline and task parallel programs from sequential source code, however data parallelism is not addressed. In this paper, we present asystematic approach for identification and extraction of fine grain data parallelism from the PPN specification. The approach is implemented in a tool, called kpn2gpu, which produces fine-grain data parallel CUDA kernels for graphics processing units (GPUs). First experiments indicate that generated applications have a potential to exploit different forms of parallelism provided by the architecture and that kernels feature a highly regular structure that allows subsequent optimizations.