A bridging model for multi-core computing

Authors:
Leslie G. Valiant
Affiliations:
School of Engineering and Applied Sciences, Harvard University, United States
Venue:
Journal of Computer and System Sciences
Year:
2011

Citing 30
Cited 11

A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems

Communications of the ACM
Communication complexity of PRAMs

Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation

Communications of the ACM
Parallel algorithms for shared-memory machines

Handbook of theoretical computer science (vol. A)
Parallel sorting by regular sampling

Journal of Parallel and Distributed Computing
Deterministic distribution sort in shared and distributed memory multiprocessors

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Direct bulk-synchronous parallel algorithms

Journal of Parallel and Distributed Computing
Towards efficiency and portability: programming with the BSP model

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
LogP: a practical model of parallel computation

Communications of the ACM
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The bulk-synchronous parallel random access machine

Theoretical Computer Science - Special issue on parallel computing
BOS is boss: a case for bulk-synchronous object systems

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
BSPlib: The BSP programming library

Parallel Computing
On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Submachine Locality in the Bulk Synchronous Setting (Extended Abstract)

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Extending the Hong-Kung Model to Memory Hierarchies

COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
The Paderborn University BSP (PUB) library

Parallel Computing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Parallelism in random access machines

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Communication lower bounds for distributed-memory matrix multiplication

Journal of Parallel and Distributed Computing
Heterogeneous Chip Multiprocessors

Computer
Fast synchronization for chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Provably good multicore cache performance for divide-and-conquer algorithms

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
BSGP: bulk-synchronous GPU programming

ACM SIGGRAPH 2008 papers
Fundamental parallel algorithms for private-cache chip multiprocessors

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Cache-efficient dynamic programming algorithms for multicores

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Amdahl's Law in the Multicore Era

Computer
A unified model for multicore architectures

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies

What Hill-Marty model learn from and break through Amdahl's law?

Information Processing Letters
Multi-DaC programming model: a variant of multi-BSP model for divide-and-conquer algorithms

DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
SGL: towards a bridging model for heterogeneous hierarchical platforms

International Journal of High Performance Computing and Networking
Palovca: describing and executing graph algorithms in haskell

PADL'12 Proceedings of the 14th international conference on Practical Aspects of Declarative Languages
3D inverted index with cache sharing for web search engines

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Adapting MPI to MapReduce PaaS Clouds: An Experiment in Cross-Paradigm Execution

UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings

Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering
Approximate parallel simulation of web search engines

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Efficient Parallel Implementations of Multiple Sequence Alignment using BSP/CGM Model

Proceedings of Programming Models and Applications on Multicores and Manycores
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Writing software for one parallel system is a feasible though arduous task. Reusing the substantial intellectual effort so expended for programming a second system has proved much more challenging. In sequential computing algorithms textbooks and portable software are resources that enable software systems to be written that are efficiently portable across changing hardware platforms. These resources are currently lacking in the area of multi-core architectures, where a programmer seeking high performance has no comparable opportunity to build on the intellectual efforts of others. In order to address this problem we propose a bridging model aimed at capturing the most basic resource parameters of multi-core architectures. We suggest that the considerable intellectual effort needed for designing efficient algorithms for such architectures may be most fruitfully expended in designing portable algorithms, once and for all, for such a bridging model. Portable algorithms would contain efficient designs for all reasonable combinations of the basic resource parameters and input sizes, and would form the basis for implementation or compilation for particular machines. Our Multi-BSP model is a multi-level model that has explicit parameters for processor numbers, memory/cache sizes, communication costs, and synchronization costs. The lowest level corresponds to shared memory or the PRAM, acknowledging the relevance of that model for whatever limitations on memory and processor numbers it may be efficacious to emulate it. We propose parameter-aware portable algorithms that run efficiently on all relevant architectures with any number of levels and any combination of parameters. For these algorithms we define a parameter-free notion of optimality. We show that for several fundamental problems, including standard matrix multiplication, the Fast Fourier Transform, and comparison sorting, there exist optimal portable algorithms in that sense, for all combinations of machine parameters. Thus some algorithmic generality and elegance can be found in this many parameter setting.