Loop fusion and reordering for register file optimization on stream processors

Authors:
Wanyong Tian;Chun Jason Xue;Minming Li;Enhong Chen
Affiliations:
University of Science and Technology of China and City University of Hong Kong;City University of Hong Kong;City University of Hong Kong;University of Science and Technology of China
Venue:
Proceedings of the 2011 ACM Symposium on Applied Computing
Year:
2011

Citing 11
Cited 0

Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
A Framework for Loop Distribution on Limited On-Chip Memory Processors

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Maximum Loop Distribution and Fusion for Two-level Loops Considering Code Size

ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
Compiling for stream processing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Optimizing scientific application loops on stream processors

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Comparability graph coloring for optimizing utilization of stream register files in stream processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop distribution and fusion with timing and code size optimization for embedded DSPs

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stream processors are gaining popularity and getting deployed in many multimedia and scientific applications. Stream Register File (SRF) is a non-bypassing software-managed on-chip memory. It is a critical resource in stream processors. When loading a program from the off-chip memory into SRF for executing, the storage consumption and the data transfer time are two key factors which affect the performance. This work applies loop transformation to programs for SRF optimization. We consider two objectives of minimizing the storage consumption and data transfer time. Previous techniques concentrate on the utilization of SRF only. This is the first paper considering both the two factors. We present a cost evaluation function in this paper and apply loop fusion and reordering to improve the performance of stream processors. The experimental results show significant performance improvement.