Load scheduling: reducing pressure on distributed register files for free

Authors:
Mei Wen;Nan Wu;Maolin Guan;Chunyuan Zhang
Affiliations:
National Laboratory for Parallel & Distributed Processing, Chang Sha, Hu Nan, P.R. of China;National Laboratory for Parallel & Distributed Processing, Chang Sha, Hu Nan, P.R. of China;National Laboratory for Parallel & Distributed Processing, Chang Sha, Hu Nan, P.R. of China;National Laboratory for Parallel & Distributed Processing, Chang Sha, Hu Nan, P.R. of China
Venue:
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Year:
2008

Citing 15
Cited 0

Register allocation via graph coloring

Register allocation via graph coloring
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Advanced compiler design and implementation

Advanced compiler design and implementation
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A Method for Register Allocation to Loops in Multiple Register File Architectures

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Efficient Scheduling of DSP Code on Processors with Distributed Register Files

Proceedings of the 12th international symposium on System synthesis
A programming system for the imagine media processor

A programming system for the imagine media processor
The vlsi implementation and evaluation of area- and energy-efficient streaming media processors

The vlsi implementation and evaluation of area- and energy-efficient streaming media processors
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Prematerialization: reducing register pressure for free

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A 64-bit stream processor architecture for scientific applications

Proceedings of the 34th annual international symposium on Computer architecture
Register allocation on stream processor with local register file

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe load scheduling, a novel method that balances load among register files by residual resources. Load scheduling can reduce register pressure for clustered VLIW processors with distributed register files while not increasing VLIW scheduling length. We have implemented load scheduling in compiler for Imagine and FT64 stream processors. The result shows that the proposed technique effectively reduces the number of variables spilled to memory, and can even eliminate it. The algorithm presented in this paper is extremely efficient in embedded processor with limited register resource because it can improve registers utilization instead of increasing the requirement for the number of registers.