Combined partitioning and data padding for scheduling multiple loop nests

Authors:
Zhong Wang;Edwin H.-M. Sha;Xiaobo (Sharon) Hu
Affiliations:
University of Notre Dame, Notre Dame, IN;University of Texas at Dallas, Richardson, TX;University of Notre Dame, Notre Dame, IN
Venue:
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2001

Citing 13
Cited 3

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Achieving Full Parallelism Using Multidimensional Retiming

IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Optimal weighted loop fusion for parallel programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Scheduling of uniform multidimensional systems under resource constraints

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A tile selection algorithm for data locality and cache interference

ICS '99 Proceedings of the 13th international conference on Supercomputing
Optimal two level partitioning and loop scheduling for hiding memory latency for DSP applications

Proceedings of the 37th Annual Design Automation Conference
On Uniformization of Affine Dependence Algorithms

IEEE Transactions on Computers
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Loop Scheduling and Partitions for Hiding Memory Latencies

Proceedings of the 12th international symposium on System synthesis
Compiler Transformations for High-Performance Computing

Compiler Transformations for High-Performance Computing
An analytical model for loop tiling and its solution

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Register aware scheduling for distributed cache clustered architecture

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Performance advantage of reconfigurable cache design on multicore processor systems

International Journal of Parallel Programming
Comprehensive cache performance tuning with a toolset

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the widening performance gap between processors and main memory, efficient memory accessing behavior is necessary for good program performance. Loop partition is an effective way to exploit the data locality. Traditional loop partition techniques, however, consider only a singleton nested loop. This paper presents multiple loop partition scheduling technique, which combines the loop partition and data padding to generate the detailed partition schedule. The computation and data prefetching are balanced in the partition schedule, such that the long memory latency can be hidden efficiently. Multiple loop partition scheduling explores parallelism among computations, and exploit the data locality between different loop nests as well in each loop nest. Data padding is applied in our technique to eliminate the cache interference, which overcomes the problem of cache conflict misses arisen from loop partition. Therefore, our technique can be applied in architectures with low associativity cache. The experiments show that multiple loop partition scheduling can achieve the significant improvement over the existing methods