Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Maximum Loop Distribution and Fusion for Two-level Loops Considering Code Size
ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Optimizing scientific application loops on stream processors
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop distribution and fusion with timing and code size optimization for embedded DSPs
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Hi-index | 0.00 |
Stream processors are gaining popularity and getting deployed in many multimedia and scientific applications. Stream Register File (SRF) is a non-bypassing software-managed on-chip memory. It is a critical resource in stream processors. When loading a program from the off-chip memory into SRF for executing, the storage consumption and the data transfer time are two key factors which affect the performance. This work applies loop transformation to programs for SRF optimization. We consider two objectives of minimizing the storage consumption and data transfer time. Previous techniques concentrate on the utilization of SRF only. This is the first paper considering both the two factors. We present a cost evaluation function in this paper and apply loop fusion and reordering to improve the performance of stream processors. The experimental results show significant performance improvement.