POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A practical data flow framework for array reference analysis and its use in optimizations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Compiling for numa parallel machines
Compiling for numa parallel machines
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Array data flow analysis for load-store optimizations in fine-grain architectures
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Quantifying the multi-level nature of tiling interactions
International Journal of Parallel Programming
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
DSP Processors Hit the Mainstream
Computer
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
An Efficient Data Partitioning Method for Limited Memory Embedded Systems
LCTES '98 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Compiler Optimizations for Real Time Execution of Loops on Limited Memory Embedded Systems
RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
Memory Organization for Improved Data Cache Performance in Embedded Processors
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Loop fusion and reordering for register file optimization on stream processors
Proceedings of the 2011 ACM Symposium on Applied Computing
Loop fusion and reordering for register file optimization on stream processors
Journal of Systems and Software
Hi-index | 0.00 |
This work proposes a framework for analyzing the flow of values and their re-use in loop nests to minimize data traffic under the constraints of limited on-chip memory capacity and dependences. Our analysis first undertakes fusion of possible loop nests intra-procedurally and then performs loop distribution. The analysis discovers the closeness factor of two statements which is a quantitative measure of data traffic saved per unit memory occupied if the statements were under the same loop nest over the case where they are under different loop nests. We then develop a greedy algorithm which traverses the program dependence graph (PDG) to group statements together under the same loop nest legally. The main idea of this greedy algorithm is to transitively generate a group of statements that can legally execute under a given loop nest that can lead to a minimum data traffic. We implemented our framework in Petit, a tool for dependence analysis and loop transformations. We show that the benefit due to our approach results in eliminating as much as 30 % traffic in some cases improving overall completion time by a 23.33 % for processors such as TI's TMS320C5x.