Amortized efficiency of list update and paging rules
Communications of the ACM
A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems
Communications of the ACM
ACM Transactions on Computer Systems (TOCS)
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Cache and memory hierarchy design: a performance-directed approach
Cache and memory hierarchy design: a performance-directed approach
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Large-scale sorting in uniform memory hierarchies
Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply
ACM Transactions on Programming Languages and Systems (TOPLAS)
Simple randomized mergesort on parallel disks
Parallel Computing - Special double issue: parallel I/O
Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems
SIAM Journal on Computing
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache performance analysis of traversals and random accesses
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Tuning Strassen's matrix multiplication for memory efficiency
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Extending the Hong-Kung Model to Memory Hierarchies
COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
Towards an Optimal Bit-Reversal Permutation Program
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
ACM SIGGRAPH 2005 Papers
An analytical model for cache replacement policy performance
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Translating submachine locality into locality of reference
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Combating I-O bottleneck using prefetching: model, algorithms, and ramifications
The Journal of Supercomputing
On the limits of cache-oblivious rational permutations
Theoretical Computer Science
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Massive model visualization techniques: course notes
ACM SIGGRAPH 2008 classes
A Bridging Model for Multi-core Computing
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
Cache-oblivious simulation of parallel programs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ACM Transactions on Algorithms (TALG)
Algorithmic ramifications of prefetching in memory hierarchy
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Sequences of radius k: how to fetch many huge objects into small memory for pairwise computations
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
A parallel page cache: IOPS and caching for multicore systems
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Toward millions of file system IOPS on low-cost, commodity hardware
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.01 |
We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our cache model, an extension of Aggarwal and Vitter's I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cache-efficient algorithms in the single-level cache model for fundamental problems like sorting, FFT, and an important subclass of permutations. We also analyze the average-case cache behavior of mergesort, show that ignoring associativity concerns could lead to inferior performance, and present supporting experimental evidence.We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic exploitation of the memory hierarchy starting from the algorithm design stage, and for dealing with the hitherto unresolved problem of limited associativity.