On the complexity of loop fusion
Parallel Computing - Special issue on new trends on scheduling in parallel and distributed systems
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Loop fusion for memory space optimization
Proceedings of the 14th international symposium on Systems synthesis
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Media Processing Applications on the Imagine Stream Processor
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Optimizing the memory bandwidth with loop fusion
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Maximum Loop Distribution and Fusion for Two-level Loops Considering Code Size
ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
Loop scheduling with timing and switching-activity minimization for VLIW DSP
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A 64-bit stream processor architecture for scientific applications
Proceedings of the 34th annual international symposium on Computer architecture
Optimizing scientific application loops on stream processors
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Effective loop partitioning and scheduling under memory and register dual constraints
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding
ACM Transactions on Embedded Computing Systems (TECS)
Loop distribution and fusion with timing and code size optimization for embedded DSPs
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Hi-index | 0.00 |
Stream processors are gaining popularity and getting deployed in many multimedia and scientific applications. stream register file (SRF) is a non-bypassing software-managed on-chip memory. Unlike conventional register files, the input data must be all stored in the SRF when a program is being executed. It is a critical resource in stream processors. When loading a program from the off-chip memory into SRF for execution, the storage consumption and the data transfer time are two key factors which affect the performance. This work applies loop transformation to programs for SRF optimization. We consider two objectives of minimizing the storage consumption and data transfer time. Previous techniques concentrate on the utilization of SRF only. This is the first paper considering both the two factors. We present a cost evaluation function in this paper and apply loop fusion and reordering to improve the performance of stream processors. The experimental results show significant performance improvement.