A Characterization of Temporal Locality and Its Portability across Memory Hierarchies

Authors:
Gianfranco Bilardi;Enoch Peserico
Affiliations:
-;-
Venue:
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Year:
2001

Citing 14
Cited 6

Amortized efficiency of list update and paging rules

Communications of the ACM
A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Communication complexity of PRAMs

Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Computational power of pipelined memory hierarchies

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Models of Computation: Exploring the Power of Computing

Models of Computation: Exploring the Power of Computing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
External Memory Algorithms

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Hierarchical memory with block transfer

SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science

A Characterization of Temporal Locality and Its Portability across Memory Hierarchies

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Seamless Integration of Parallelism and Memory Hierarchy

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Translating submachine locality into locality of reference

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Cache-Oblivious Algorithms

ACM Transactions on Algorithms (TALG)
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper formulates and investigates the question of whether a given algorithm can be coded in a way efficiently portable across machines with different hierarchical memory systems, modeled as a(x)-HRAMs (Hierarchical RAMs), where the time to access a location x is a(x). The width decomposition framework is proposed to provide a machine-independent characterization of temporal locality of a computation by a suitable set of space reuse parameters. Using this framework, it is shown that, when the schedule, i.e. the order by which operations are executed, is fixed, efficient portability is achievable. We propose (a) the decomposition-tree memory manager, which achieves time within a logarithmic factor of optimal on all HRAMs, and (b) the reoccurrence-width memory manager, which achieves time within a constant factor of optimal for the important class of uniform HRAMs. We also show that, when the schedule is considered as a degree of freedom of the implementation, there are computations whose optimal schedule does vary with the access function. In particular, we exhibit some computations for which any schedule is bound to be a polynomial factor slower than optimal on at least one of two sufficiently different machines. On the positive side, we show that relatively few schedules are sufficient to provide a near optimal solution on a wide class of HRAMs.