VLSI array processors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
The design and analysis of spatial data structures
The design and analysis of spatial data structures
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Chores: enhanced run-time support for shared-memory parallel computing
ACM Transactions on Computer Systems (TOCS)
Link-time optimization of address calculation on a 64-bit architecture
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
On shrinking binary picture patterns
Communications of the ACM
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiler Transformations for High-Performance Computing
Compiler Transformations for High-Performance Computing
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
Dependence driven execution for multiprogrammed multiprocessor
ICS '98 Proceedings of the 12th international conference on Supercomputing
SMARTS: exploiting temporal locality and parallelism through vertical execution
ICS '99 Proceedings of the 13th international conference on Supercomputing
Asynchronous Resource Management
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Hi-index | 0.00 |
The order in which loop iterations are executed can have a large impact on the number of cache misses that an applications takes. A new loop order that preserves the semantics of the old order but has a better cache data re-use, improves the performance of that application. Several compiler techniques exist to transform loops such that the order of iterations reduces cache misses. This paper introduces a run-time method to determine the order based on a dependence-driven execution. In a dependence-driven execution, an execution traverses the iteration space by following the dependence arcs between the iterations.