I/O complexity: The red-blue pebble game

Authors:
Hong Jia-Wei;H. T. Kung
Affiliations:
-;-
Venue:
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Year:
1981

Citing 2
Cited 107

The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
On Time Versus Space

Journal of the ACM (JACM)

Memory requirements for balanced computer architectures

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems

Communications of the ACM
Virtual memory algorithms

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Circuits and local computation

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
Tradeoffs between communication and space

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
An Evaluation of Multiple-Disk I/O Systems

IEEE Transactions on Computers
The input/output complexity of transitive closure

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Optimal disk I/O with parallel block transfer

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Large-scale sorting in parallel memories (extended abstract)

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
I/O Overhead and Parallel VLSI Architectures for Lattice Computations

IEEE Transactions on Computers
Deterministic distribution sort in shared and distributed memory multiprocessors

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Greed sort: optimal deterministic sorting on parallel disks

Journal of the ACM (JACM)
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Simple randomized mergesort on parallel disks

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
&mgr;Database: parallelism in a memory-mapped environment (research summary)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A quantitative comparison of parallel computation models

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Strategic directions in research in theory of computing

ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
Designing a Scalable Processor Array for Recurrent Computations

IEEE Transactions on Parallel and Distributed Systems
External memory algorithms

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A quantitative comparison of parallel computation models

ACM Transactions on Computer Systems (TOCS)
Graph-theoretic methods in database theory

PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Towards a theory of cache-efficient algorithms

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
On optimal temporal locality of stencil codes

Proceedings of the 2002 ACM symposium on Applied computing
The design of I/O-efficient sparse direct solvers

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Towards a theory of cache-efficient algorithms

Journal of the ACM (JACM)
Optimizing Graph Algorithms for Improved Cache Performance

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
I/O-Space Trade-Offs

SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
On the Space and Access Complexity of Computation DAGs

WG '00 Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science
External Memory Algorithms

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
An Analytical Evaluation of Tiling for Stencil Codes with Time Loop

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
External memory algorithms

Handbook of massive data sets
A Theoretical Framework for Memory-Adaptive Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Use of VLSI in algebraic computation: Some suggestions

SYMSAC '81 Proceedings of the fourth ACM symposium on Symbolic and algebraic computation
Domain-Specific Modeling for Rapid Energy Estimation of Reconfigurable Architectures

The Journal of Supercomputing
A fast Fourier transform compiler

ACM SIGPLAN Notices - Best of PLDI 1979-1999
On Scheduling Mesh-Structured Computations for Internet-Based Computing

IEEE Transactions on Computers
Optimizing Graph Algorithms for Improved Cache Performance

IEEE Transactions on Parallel and Distributed Systems
Guidelines for Scheduling Some Common Computation-Dags for Internet-Based Computing

IEEE Transactions on Computers
Parallel scheduling of complex dags under uncertainty

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Statistical Models for Empirical Search-Based Performance Tuning

International Journal of High Performance Computing Applications
Cache oblivious stencil computations

Proceedings of the 19th annual international conference on Supercomputing
High Performance Linear Algebra Operations on Reconfigurable Systems

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cache-oblivious dynamic programming

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
The cache complexity of multithreaded cache oblivious algorithms

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Cache-Friendly implementations of transitive closure

Journal of Experimental Algorithmics (JEA)
The memory behavior of cache oblivious stencil computations

The Journal of Supercomputing
Optimal sparse matrix dense vector multiplication in the I/O-model

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An experimental comparison of cache-oblivious and cache-conscious programs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
A parallel dynamic programming algorithm on a multi-core architecture

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems

IEEE Transactions on Parallel and Distributed Systems
The VLSI Complexity of Sorting

IEEE Transactions on Computers
Matrix product on heterogeneous master-worker platforms

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Hierarchical memory with block transfer

SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
Combating I-O bottleneck using prefetching: model, algorithms, and ramifications

The Journal of Supercomputing
Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
A Bridging Model for Multi-core Computing

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
A unified model for multicore architectures

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
On approximating the ideal random access machine by physical machines

Journal of the ACM (JACM)
Simultaneous minimization of capacity and conflict misses

Journal of Computer Science and Technology
Communication-optimal parallel and sequential Cholesky decomposition: extended abstract

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Cache-optimal algorithms for option pricing

ACM Transactions on Mathematical Software (TOMS)
Evaluating multicore algorithms on the unified memory model

Scientific Programming - Software Development for Multi-core Computing Systems
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Algorithmic techniques for memory energy reduction

WEA'03 Proceedings of the 2nd international conference on Experimental and efficient algorithms
Is cache-oblivious DGEMM viable?

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Cache-Oblivious Dynamic Programming for Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Evaluating non-square sparse bilinear forms on multiple vector pairs in the I/O-model

MFCS'10 Proceedings of the 35th international conference on Mathematical foundations of computer science
A bridging model for multi-core computing

Journal of Computer and System Sciences
Cache complexity and multicore implementation for univariate real root isolation

ACM Communications in Computer Algebra
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Upper and lower I/O bounds for pebbling r-pyramids

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Graph expansion and communication costs of fast matrix multiplication: regular submission

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Brief announcement: communication bounds for heterogeneous architectures

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Balance principles for algorithm-architecture co-design

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Strong I/O lower bounds for binomial and FFT computation graphs

COCOON'11 Proceedings of the 17th annual international conference on Computing and combinatorics
Performance modeling for systematic performance tuning

State of the Practice Reports
Cache-Oblivious Algorithms

ACM Transactions on Algorithms (TALG)
Communication-optimal Parallel and Sequential Cholesky Decomposition

SIAM Journal on Scientific Computing
Algorithmic ramifications of prefetching in memory hierarchy

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A cache oblivious algorithm for matrix multiplication based on peano's space filling curve

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
The potential of on-chip multiprocessing for QCD machines

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
The i/o complexity of sparse matrix dense matrix multiplication

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
A family of high-performance matrix multiplication algorithms

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
A pebble game for internet-based computing

Theoretical Computer Science
Upper and lower I/O bounds for pebbling r-pyramids

Journal of Discrete Algorithms
On the communication complexity of 3D FFTs and its implications for Exascale

Proceedings of the 26th ACM international conference on Supercomputing
Space-round tradeoffs for MapReduce computations

Proceedings of the 26th ACM international conference on Supercomputing
Cache-conscious scheduling of streaming applications

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal Parallel and Sequential QR and LU Factorizations

SIAM Journal on Scientific Computing
CALU: A Communication Optimal LU Factorization Algorithm

SIAM Journal on Matrix Analysis and Applications
Graph expansion and communication costs of fast matrix multiplication

Journal of the ACM (JACM)
Avoiding communication through a multilevel LU factorization

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A lower bound technique for communication on BSP with application to the FFT

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Graph expansion analysis for communication costs of fast rectangular matrix multiplication

MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Tight bounds for low dimensional star stencils in the external memory model

WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Communication costs of Strassen's matrix multiplication

Communications of the ACM

Quantified Score

Hi-index	0.05

Visualization

Abstract

In this paper, the red-blue pebble game is proposed to model the input-output complexity of algorithms. Using the pebble game formulation, a number of lower bound results for the I/O requirement are proven. For example, it is shown that to perform the n-point FFT or the ordinary n×n matrix multiplication algorithm with O(S) memory, at least &Ohgr;(n log n/log S) or &Ohgr;(n3/@@@@S), respectively, time is needed for the I/O. Similar results are obtained for algorithms for several other problems. All of the lower bounds presented are the best possible in the sense that they are achievable by certain decomposition schemes. Results of this paper may provide insight into the difficult task of balancing I/O and computation in special-purpose system designs. For example, for the n-point FFT, the lower bound on I/O time implies that an S-point device achieving a speed-up ratio of order log S over the conventional O(n log n) time implementation is all one can hope for.