A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

Authors:
Gregorio Quintana-Ortí;Francisco D. Igual;Mercedes Marqués;Enrique S. Quintana-Ortí;Robert A. van de Geijn
Affiliations:
Universidad Jaume I;Universidad Jaume I;Universidad Jaume I;Universidad Jaume I;The University of Texas at Austin
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2012

Citing 17
Cited 0

The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
FLAME: Formal Linear Algebra Methods Environment

ACM Transactions on Mathematical Software (TOMS)
A Rational Approach to Portable High Performance: The Basic Linear Algebra Instruction Set (BLAIS) and the Fixed Algorithm Size Template (FAST) Library

ECOOP '98 Workshop ion on Object-Oriented Technology
The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Milestones in Computer Science and Information Technology

Milestones in Computer Science and Information Technology
The science of deriving dense linear algebra algorithms

ACM Transactions on Mathematical Software (TOMS)
Parallel out-of-core computation and updating of the QR factorization

ACM Transactions on Mathematical Software (TOMS)
Computational methods and processing strategies for estimating earth's gravity field

Computational methods and processing strategies for estimating earth's gravity field
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Updating an LU Factorization with Pivoting

ACM Transactions on Mathematical Software (TOMS)
Operating System Concepts

Operating System Concepts
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Out-of-Core Implementations of Cholesky Factorization: Loop-Based versus Recursive Algorithms

SIAM Journal on Matrix Analysis and Applications
Exploiting the capabilities of modern GPUs for dense matrix computations

Concurrency and Computation: Practice & Experience
Rapid development of high-performance out-of-core solvers

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how the current state of hardware and software allows the programmability problem to be addressed without sacrificing performance. This comes from the realizations that memory is cheap and large, making it less necessary to optimally orchestrate I/O, and that new algorithms view matrices as collections of submatrices and computation as operations with those submatrices. This enables libraries to be coded at a high level of abstraction, leaving the tasks of scheduling the computations and data movement in the hands of a runtime system. This is in sharp contrast to more traditional approaches that leverage optimal use of in-core memory and, at the expense of introducing considerable programming complexity, explicit overlap of I/O with computation. Performance is demonstrated for this approach on multicore architectures as well as platforms equipped with hardware accelerators.