Loop fusion and reordering for register file optimization on stream processors

Authors:
Wanyong Tian;Chun Jason Xue;Minming Li;Enhong Chen
Affiliations:
School of Computer Science and Technology, University of Science and Technology of China, China and Department of Computer Science, City University of Hong Kong, Hong Kong and USTC-CityU Joint Res ...;Department of Computer Science, City University of Hong Kong, Hong Kong;Department of Computer Science, City University of Hong Kong, Hong Kong;School of Computer Science and Technology, University of Science and Technology of China, China
Venue:
Journal of Systems and Software
Year:
2012

Citing 20
Cited 0

On the complexity of loop fusion

Parallel Computing - Special issue on new trends on scheduling in parallel and distributed systems
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Loop fusion for memory space optimization

Proceedings of the 14th international symposium on Systems synthesis
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A Framework for Loop Distribution on Limited On-Chip Memory Processors

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Media Processing Applications on the Imagine Stream Processor

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
Optimizing the memory bandwidth with loop fusion

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Maximum Loop Distribution and Fusion for Two-level Loops Considering Code Size

ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
Loop scheduling with timing and switching-activity minimization for VLIW DSP

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiling for stream processing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A 64-bit stream processor architecture for scientific applications

Proceedings of the 34th annual international symposium on Computer architecture
Optimizing scientific application loops on stream processors

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Effective loop partitioning and scheduling under memory and register dual constraints

Proceedings of the conference on Design, automation and test in Europe
Comparability graph coloring for optimizing utilization of stream register files in stream processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding

ACM Transactions on Embedded Computing Systems (TECS)
Loop distribution and fusion with timing and code size optimization for embedded DSPs

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stream processors are gaining popularity and getting deployed in many multimedia and scientific applications. stream register file (SRF) is a non-bypassing software-managed on-chip memory. Unlike conventional register files, the input data must be all stored in the SRF when a program is being executed. It is a critical resource in stream processors. When loading a program from the off-chip memory into SRF for execution, the storage consumption and the data transfer time are two key factors which affect the performance. This work applies loop transformation to programs for SRF optimization. We consider two objectives of minimizing the storage consumption and data transfer time. Previous techniques concentrate on the utilization of SRF only. This is the first paper considering both the two factors. We present a cost evaluation function in this paper and apply loop fusion and reordering to improve the performance of stream processors. The experimental results show significant performance improvement.