Loop re-ordering and pre-fetching at run-time

Authors:
Suvas Vajracharya;Dirk Grunwald
Affiliations:
University of Colorado, Boulder, CO;University of Colorado, Boulder, CO
Venue:
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Year:
1997

Citing 13
Cited 3

VLSI array processors

VLSI array processors
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
The design and analysis of spatial data structures

The design and analysis of spatial data structures
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Chores: enhanced run-time support for shared-memory parallel computing

ACM Transactions on Computer Systems (TOCS)
Link-time optimization of address calculation on a 64-bit architecture

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
On shrinking binary picture patterns

Communications of the ACM
Automatic loop interchange

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Compiler Transformations for High-Performance Computing

Compiler Transformations for High-Performance Computing
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers

Dependence driven execution for multiprogrammed multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
SMARTS: exploiting temporal locality and parallelism through vertical execution

ICS '99 Proceedings of the 13th international conference on Supercomputing
Asynchronous Resource Management

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

The order in which loop iterations are executed can have a large impact on the number of cache misses that an applications takes. A new loop order that preserves the semantics of the old order but has a better cache data re-use, improves the performance of that application. Several compiler techniques exist to transform loops such that the order of iterations reduces cache misses. This paper introduces a run-time method to determine the order based on a dependence-driven execution. In a dependence-driven execution, an execution traverses the iteration space by following the dependence arcs between the iterations.