Loop Shifting for Loop Compaction
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
The GeForce 6 series GPU architecture
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
ILP and heuristic techniques for system-level design on network processor architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Proceedings of the 48th Design Automation Conference
Mapping on multi/many-core systems: survey of current and emerging trends
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream applications distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Programing on SPM based architecture is both challenging and time consuming. In this paper we address the problem of automatic compilation of stream applications onto SPM based embedded multicore processors through unrolling and retiming. In our technique, code overlay and data overlay are implemented to overcome the limited SPM capacity. Smart double buffering and code prefetching are introduced to amortize memory access delays. We evaluated the efficiency of our technique through compiling several stream applications onto the IBM Cell processor and compared their performance with existing approaches.