Programming many-core architectures - a case study: dense matrix computations on the Intel single-chip cloud computer processor

Authors:
Bryan Marker;Ernie Chan;Jack Poulson;Robert van de Geijn;Rob F. Van der Wijngaart;Timothy G. Mattson;Theodore E. Kubaska
Affiliations:
Dept. of Computer Science, The Univ. of Texas at Austin, Austin, Texas 78712;Dept. of Computer Science, The Univ. of Texas at Austin, Austin, Texas 78712;Institute for Computational Engineering and Sciences, The Univ. of Texas at Austin, Austin, Texas 78712;Dept. of Computer Science, The Univ. of Texas at Austin, Austin, Texas 78712;Intel Corporation, Santa Clara, California 95054;Intel Corporation, DuPont, Washington 98327;IntelCorporation, Hillsboro, Oregon 97124
Venue:
Concurrency and Computation: Practice & Experience
Year:
2012

Citing 17
Cited 1

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
The torus-wrap mapping for dense matrix calculations on massively parallel computers

SIAM Journal on Scientific Computing
Using PLAPACK: parallel linear algebra package

Using PLAPACK: parallel linear algebra package
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment

ACM Transactions on Mathematical Software (TOMS)
Pentium Processor System Architecture

Pentium Processor System Architecture
The Hierarchical Factor Algorithm for All-to-All Communication (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Send-receive considered harmful: Myths and realities of message passing

ACM Transactions on Programming Languages and Systems (TOPLAS)
The science of deriving dense linear algebra algorithms

ACM Transactions on Mathematical Software (TOMS)
Representing linear algebra algorithms in code: the FLAME application program interfaces

ACM Transactions on Mathematical Software (TOMS)
Collective communication: theory, practice, and experience: Research Articles

Concurrency and Computation: Practice & Experience
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
The 48-core SCC Processor: the Programmer's View

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Light-weight communications on Intel's single-chip cloud computer processor

ACM SIGOPS Operating Systems Review
Optimizing power using transformations

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Elemental: A New Framework for Distributed Memory Dense Matrix Computations

ACM Transactions on Mathematical Software (TOMS)

Elemental: A New Framework for Distributed Memory Dense Matrix Computations

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A message passing, distributed-memory parallel computer on a chip is one possible design for future, many-core architectures. We discuss initial experiences with the Intel Single-chip Cloud Computer research processor, which is a prototype architecture that incorporates 48 cores on a single die that can communicate via a small, shared, on-die buffer. The experiment is to port a state-of-the-art, distributed-memory, dense matrix library, Elemental, to this architecture and gain insight from the experience. We show that programmability addressed by this library, especially the proper abstraction for collective communication, greatly aids the porting effort. This enables us to support a wide range of functionality with limited changes to the library code. Copyright © 2011 John Wiley & Sons, Ltd.