StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
The GeForce 6 series GPU architecture
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Executing synchronous dataflow graphs on a SPM-based multicore architecture
Proceedings of the 49th Annual Design Automation Conference
Unrolling and retiming of stream applications onto embedded multicore processors
Proceedings of the 49th Annual Design Automation Conference
Integrating software caches with scratch pad memory
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Dynamic scheduling of stream programs on embedded multi-core processors
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A lifetime aware buffer assignment method for streaming applications on DRAM/PRAM hybrid memory
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors
ACM Transactions on Embedded Computing Systems (TECS)
Maximum-throughput mapping of SDFGs on multi-core SoC platforms
Journal of Parallel and Distributed Computing
SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
The prevalence of stream applications in signal processing, multi-media, and network processing domains has resulted in a new trend of programming and architecture design. Several languages and multicore architectures have been developed to support streaming applications. In many of these multicore architectures scratchpad memories (SPM) have substituted caches due to their lower power consumption. Performance optimization on SPM based architectures requires the programmer/compiler to efficiently manage the limited local memory. Our paper addresses the problem of compilation of stream programs onto multicore architectures that incorporate SPMs. We propose a retiming technique that maximizes the throughput under a memory constraint with a user-specified number of software pipeline stages. Trade-offs between double buffering and code overlay are explored intensively in our technique to achieve the best performance. The efficiency of our technique was evaluated by compiling several stream applications for the IBM Cell BE and comparing their results against existing approaches.