Towards a theory of cache-efficient algorithms

Authors:
Sandeep Sen;Siddhartha Chatterjee;Neeraj Dumir
Affiliations:
Indian Institute of Technology Delhi, New Delhi, India;IBM Research, Yorktown Heights, New York;Indian Institute of Technology Delhi, New Delhi, India
Venue:
Journal of the ACM (JACM)
Year:
2002

Citing 22
Cited 16

Amortized efficiency of list update and paging rules

Communications of the ACM
A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems

Communications of the ACM
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Large-scale sorting in uniform memory hierarchies

Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply

ACM Transactions on Programming Languages and Systems (TOPLAS)
Simple randomized mergesort on parallel disks

Parallel Computing - Special double issue: parallel I/O
Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems

SIAM Journal on Computing
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
External-memory graph algorithms

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
The influence of caches on the performance of sorting

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache performance analysis of traversals and random accesses

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Tuning Strassen's matrix multiplication for memory efficiency

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
Extending the Hong-Kung Model to Memory Hierarchies

COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
Towards an Optimal Bit-Reversal Permutation Program

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing

Cache-oblivious mesh layouts

ACM SIGGRAPH 2005 Papers
An analytical model for cache replacement policy performance

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Translating submachine locality into locality of reference

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
A memory model for scientific algorithms on graphics processors

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Combating I-O bottleneck using prefetching: model, algorithms, and ramifications

The Journal of Supercomputing
On the limits of cache-oblivious rational permutations

Theoretical Computer Science
Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
Massive model visualization techniques: course notes

ACM SIGGRAPH 2008 classes
A Bridging Model for Multi-core Computing

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Evaluating multicore algorithms on the unified memory model

Scientific Programming - Software Development for Multi-core Computing Systems
Cache-oblivious simulation of parallel programs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Cache-Oblivious Algorithms

ACM Transactions on Algorithms (TALG)
Algorithmic ramifications of prefetching in memory hierarchy

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Sequences of radius k: how to fetch many huge objects into small memory for pairwise computations

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
A parallel page cache: IOPS and caching for multicore systems

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Toward millions of file system IOPS on low-cost, commodity hardware

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our cache model, an extension of Aggarwal and Vitter's I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cache-efficient algorithms in the single-level cache model for fundamental problems like sorting, FFT, and an important subclass of permutations. We also analyze the average-case cache behavior of mergesort, show that ignoring associativity concerns could lead to inferior performance, and present supporting experimental evidence.We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic exploitation of the memory hierarchy starting from the algorithm design stage, and for dealing with the hitherto unresolved problem of limited associativity.