Effective loop partitioning and scheduling under memory and register dual constraints

Authors:
Chun Jason Xue;Edwin H.-M. Sha;Zili Shao;Meikang Qiu
Affiliations:
City University of Hong Kong;U. of Texas at Dallas, Richardson, Taxas;Hong Kong Polytechnic U., Hong Kong;U. of New Orleans
Venue:
Proceedings of the conference on Design, automation and test in Europe
Year:
2008

Citing 10
Cited 5

Cache miss heuristics and preloading techniques for general-purpose programs

Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Optimizing Overall Loop Schedules Using Prefetching and Partitioning

IEEE Transactions on Parallel and Distributed Systems
Scheduling and partitioning for multiple loop nests

Proceedings of the 14th international symposium on Systems synthesis
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
RCRS: A Framework for Loop Scheduling with Limited Number of Registers

GLS '98 Proceedings of the Great Lakes Symposium on VLSI '98
Loop Scheduling and Partitions for Hiding Memory Latencies

Proceedings of the 12th international symposium on System synthesis
Guided region prefetching: a cooperative hardware/software approach

Proceedings of the 30th annual international symposium on Computer architecture
Rotation scheduling: a loop pipelining algorithm

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processors

Journal of Systems and Software
Memory access optimization in compilation for coarse-grained reconfigurable architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Loop fusion and reordering for register file optimization on stream processors

Journal of Systems and Software
Instruction Cache Locking for Embedded Systems using Probability Profile

Journal of Signal Processing Systems
Boosting efficiency of fault detection and recovery throughapplication-specific comparison and checkpointing

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loops are the most important sections for embedded applications. To achieve high performance, two loop transformation techniques are often applied, namely loop pipelining and loop partitioning. loop pipelining is an effective approach to increase parallelism and reduce schedule length. Loop partitioning with prefetching increases data locality and hides memory latency. However, loop pipelining increases register pressure and loop partitioning increases local memory requirement. As most embedded systems have limited number of registers and limited memory, without careful study, these two techniques can not be applied effectively. In this paper, we propose an effective scheduling framework, Register and Memory Sensitive Partitioning(RMSP), to minimize average schedule length per iteration under register and memory dual constraints for parallel embedded systems. Experiments show that RMSP reduces schedule length by 14.1% in average compared to previous methods applied directly.