Optimal two level partitioning and loop scheduling for hiding memory latency for DSP applications

Authors:
Zhong Wang;Michael Kirkpatrick;Edwin Hsing-Mean Sha
Affiliations:
Dept of Comp Sci & Engr, University of Notre Dame, Notre Dame, IN;Dept of Comp Sci & Engr, University of Notre Dame, Notre Dame, IN;Dept of Comp Sci & Engr, University of Notre Dame, Notre Dame, IN
Venue:
Proceedings of the 37th Annual Design Automation Conference
Year:
2000

Citing 5
Cited 5

Data prefetching for high-performance processors

Data prefetching for high-performance processors
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
Scheduling of uniform multidimensional systems under resource constraints

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Loop Scheduling and Partitions for Hiding Memory Latencies

Proceedings of the 12th international symposium on System synthesis

Optimal partitioning and balanced scheduling with the maximal overlap of data footprints

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Scheduling and partitioning for multiple loop nests

Proceedings of the 14th international symposium on Systems synthesis
Combined partitioning and data padding for scheduling multiple loop nests

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Loop scheduling and bank type assignment for heterogeneous multi-bank memory

Journal of Parallel and Distributed Computing
Improving the memory bandwidth utilization using loop transformations

PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The large latency of memory accesses in modern computers is a key obstacle in achieving high processor utilization. To hide this latency, this paper proposes a new memory management technique that can be applied to computer architectures with three levels of memory. The technique takes advantage of access pattern information that is available at compile time by prefetching certain data elements from the higher level memory. It as well maintains certain data for a period of time to prevent unnecessary data swapping. Data locality is much improved compared with the usual pattern by partitioning the iteration space and reducing execution in each partition. These combined approaches lead to improvements in average execution times of approximately 35% over the one-level partition algorithm and more than 80% over list scheduling and hardware prefetching.