Balance principles for algorithm-architecture co-design

Authors:
Kent Czechowski;Casey Battaglino;Chris McClanahan;Aparna Chandramowlishwaran;Richard Vuduc
Affiliations:
Georgia Institute of Technology, School of Computer Science;Georgia Institute of Technology, School of Computational Science and Engineering;Georgia Institute of Technology, School of Computer Science;Georgia Institute of Technology, School of Computational Science and Engineering;Georgia Institute of Technology, School of Computational Science and Engineering
Venue:
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Year:
2011

Citing 30
Cited 2

Quantitative system performance: computer system analysis using queueing network models

Quantitative system performance: computer system analysis using queueing network models
Memory requirements for balanced computer architectures

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The input/output complexity of sorting and related problems

Communications of the ACM
Estimating interlock and improving balance for pipelined architectures

Journal of Parallel and Distributed Computing
A bridging model for parallel computation

Communications of the ACM
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Modeling the benefits of mixed data and task parallelism

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Programming parallel algorithms

Communications of the ACM
The Parallel Evaluation of General Arithmetic Expressions

Journal of the ACM (JACM)
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Towards an energy complexity of computation

Information Processing Letters - Special issue in honor of Edsger W. Dijkstra
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Latency lags bandwith

Communications of the ACM - Voting systems
Communication lower bounds for distributed-memory matrix multiplication

Journal of Parallel and Distributed Computing
An experimental comparison of cache-oblivious and cache-conscious programs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
A metric space for computer programs and the principle of computational least action

The Journal of Supercomputing
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A Bridging Model for Multi-core Computing

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Amdahl's Law in the Multicore Era

Computer
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era

Computer
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
Analysis of Parallel Algorithms for Energy Conservation in Scalable Multicore Architectures

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Low depth cache-oblivious algorithms

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
A quantitative performance analysis model for GPU architectures

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture

On the communication complexity of 3D FFTs and its implications for Exascale

Proceedings of the 26th ACM international conference on Supercomputing
How much (execution) time and energy does my algorithm cost?

XRDS: Crossroads, The ACM Magazine for Students - Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of "co-design," by which we mean the problem of how to design computational algorithms for particular hardware architectures and vice-versa. Our position is that balance principles should drive the co-design process. A balance principle is a theoretical constraint equation that explicitly relates algorithm parameters to hardware parameters according to some figure of merit, such as speed, power, or cost. This notion originates in the work of Kung (1986); Callahan, Cocke, and Kennedy (1988); and McCalpin (1995); however, we reinterpret these classical notions of balance in a modern context of parallel and I/O-efficient algorithm design as well as trends in emerging architectures. From such a principle, we argue that one can better understand algorithm and hardware trends, and furthermore gain insight into how to improve both algorithms and hardware. For example, we suggest that although matrix multiply is currently compute-bound, it will in fact become memory-bound in as few as ten years--even if last-level caches grow at their current rates. Our overall aim is to suggest how to co-design rigorously and quantitatively while still yielding intuition and insight.