Loop Scheduling and Partitions for Hiding Memory Latencies

Authors:
Fei Chen;Edwin Hsing-Mean Sha
Affiliations:
Dept. of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN;Dept. of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN
Venue:
Proceedings of the 12th international symposium on System synthesis
Year:
1999

Citing 5
Cited 11

Achieving Full Parallelism Using Multidimensional Retiming

IEEE Transactions on Parallel and Distributed Systems
Scheduling of uniform multidimensional systems under resource constraints

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps

ICPP '97 Proceedings of the international Conference on Parallel Processing
An adaptive sequential prefetching scheme in shared-memory multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing

Optimal two level partitioning and loop scheduling for hiding memory latency for DSP applications

Proceedings of the 37th Annual Design Automation Conference
Optimal partitioning and balanced scheduling with the maximal overlap of data footprints

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Scheduling and partitioning for multiple loop nests

Proceedings of the 14th international symposium on Systems synthesis
Combined partitioning and data padding for scheduling multiple loop nests

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Loop Scheduling with Complete Memory Latency Hiding on Multi-core Architecture

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Reducing off-chip memory access via stream-conscious tiling on multimedia applications

International Journal of Parallel Programming
Partitioning and scheduling DSP applications with maximal memory access hiding

EURASIP Journal on Applied Signal Processing
Effective loop partitioning and scheduling under memory and register dual constraints

Proceedings of the conference on Design, automation and test in Europe
Optimizing parallelism for nested loops with iterational and instructional retiming

Journal of Embedded Computing - Selected papers of EUC 2005
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding

ACM Transactions on Embedded Computing Systems (TECS)
Optimizing nested loops with iterational and instructional retiming

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partition Scheduling with Prefetching (PSP) is a memory latency hiding technique which combines the loop pipelining technique with data prefetching. In PSP, the iteration space is first divided into regular partitions. Then two parts of the schedule, the ALU part and the memory part, are produced and balanced to produce an overall schedule with high throughput. These two parts are executed simultaneously, and hence the remote memory latency are overlapped. We study the optimal partition shape and size so that a well balanced overall schedule can be obtained. Experiments on DSP benchmarks show that the proposed methodology consistently produces optimal or near optimal solutions.