Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
Scheduling of uniform multidimensional systems under resource constraints
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps
ICPP '97 Proceedings of the international Conference on Parallel Processing
An adaptive sequential prefetching scheme in shared-memory multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Optimal two level partitioning and loop scheduling for hiding memory latency for DSP applications
Proceedings of the 37th Annual Design Automation Conference
Optimal partitioning and balanced scheduling with the maximal overlap of data footprints
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Scheduling and partitioning for multiple loop nests
Proceedings of the 14th international symposium on Systems synthesis
Combined partitioning and data padding for scheduling multiple loop nests
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Loop Scheduling with Complete Memory Latency Hiding on Multi-core Architecture
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Reducing off-chip memory access via stream-conscious tiling on multimedia applications
International Journal of Parallel Programming
Partitioning and scheduling DSP applications with maximal memory access hiding
EURASIP Journal on Applied Signal Processing
Effective loop partitioning and scheduling under memory and register dual constraints
Proceedings of the conference on Design, automation and test in Europe
Optimizing parallelism for nested loops with iterational and instructional retiming
Journal of Embedded Computing - Selected papers of EUC 2005
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding
ACM Transactions on Embedded Computing Systems (TECS)
Optimizing nested loops with iterational and instructional retiming
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Hi-index | 0.00 |
Partition Scheduling with Prefetching (PSP) is a memory latency hiding technique which combines the loop pipelining technique with data prefetching. In PSP, the iteration space is first divided into regular partitions. Then two parts of the schedule, the ALU part and the memory part, are produced and balanced to produce an overall schedule with high throughput. These two parts are executed simultaneously, and hence the remote memory latency are overlapped. We study the optimal partition shape and size so that a well balanced overall schedule can be obtained. Experiments on DSP benchmarks show that the proposed methodology consistently produces optimal or near optimal solutions.