Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
Optimizing Overall Loop Schedules Using Prefetching and Partitioning
IEEE Transactions on Parallel and Distributed Systems
Scheduling and partitioning for multiple loop nests
Proceedings of the 14th international symposium on Systems synthesis
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
RCRS: A Framework for Loop Scheduling with Limited Number of Registers
GLS '98 Proceedings of the Great Lakes Symposium on VLSI '98
Loop Scheduling and Partitions for Hiding Memory Latencies
Proceedings of the 12th international symposium on System synthesis
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Rotation scheduling: a loop pipelining algorithm
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processors
Journal of Systems and Software
Memory access optimization in compilation for coarse-grained reconfigurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Loop fusion and reordering for register file optimization on stream processors
Journal of Systems and Software
Instruction Cache Locking for Embedded Systems using Probability Profile
Journal of Signal Processing Systems
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Hi-index | 0.00 |
Loops are the most important sections for embedded applications. To achieve high performance, two loop transformation techniques are often applied, namely loop pipelining and loop partitioning. loop pipelining is an effective approach to increase parallelism and reduce schedule length. Loop partitioning with prefetching increases data locality and hides memory latency. However, loop pipelining increases register pressure and loop partitioning increases local memory requirement. As most embedded systems have limited number of registers and limited memory, without careful study, these two techniques can not be applied effectively. In this paper, we propose an effective scheduling framework, Register and Memory Sensitive Partitioning(RMSP), to minimize average schedule length per iteration under register and memory dual constraints for parallel embedded systems. Experiments show that RMSP reduces schedule length by 14.1% in average compared to previous methods applied directly.