Amortized efficiency of list update and paging rules
Communications of the ACM
A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
Cache and memory hierarchy design: a performance-directed approach
Cache and memory hierarchy design: a performance-directed approach
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Computational power of pipelined memory hierarchies
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Models of Computation: Exploring the Power of Computing
Models of Computation: Exploring the Power of Computing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Seamless Integration of Parallelism and Memory Hierarchy
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Translating submachine locality into locality of reference
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
ACM Transactions on Algorithms (TALG)
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
This paper formulates and investigates the question of whether a given algorithm can be coded in a way efficiently portable across machines with different hierarchical memory systems, modeled as a(x)-HRAMs (Hierarchical RAMs), where the time to access a location x is a(x). The width decomposition framework is proposed to provide a machine-independent characterization of temporal locality of a computation by a suitable set of space reuse parameters. Using this framework, it is shown that, when the schedule, i.e. the order by which operations are executed, is fixed, efficient portability is achievable. We propose (a) the decomposition-tree memory manager, which achieves time within a logarithmic factor of optimal on all HRAMs, and (b) the reoccurrence-width memory manager, which achieves time within a constant factor of optimal for the important class of uniform HRAMs. We also show that, when the schedule is considered as a degree of freedom of the implementation, there are computations whose optimal schedule does vary with the access function. In particular, we exhibit some computations for which any schedule is bound to be a polynomial factor slower than optimal on at least one of two sufficiently different machines. On the positive side, we show that relatively few schedules are sufficient to provide a near optimal solution on a wide class of HRAMs.