On the performance of parallel factorization of out-of-core matrices

Authors:
Eddy Caron;Gil Utard
Affiliations:
GRAAL Project, INRIA Rhône Alpes, LIP Laboratory (UMR CNRS, ENS Lyon, INRIA, Univ., Claude Bernard Lyon 1), 46 Allée d'Italie, 69364 Lyon Cedex 07, France;GRAAL Project, INRIA Rhône Alpes, LIP Laboratory (UMR CNRS, ENS Lyon, INRIA, Univ., Claude Bernard Lyon 1), 46 Allée d'Italie, 69364 Lyon Cedex 07, France
Venue:
Parallel Computing
Year:
2004

Citing 11
Cited 3

Stability of block algorithms with fast level-3 BLAS

ACM Transactions on Mathematical Software (TOMS)
High-performance I/O for massively parallel computers: problems and prospects

Computer
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Key concepts for parallel out-of-core LU factorization

Parallel Computing - Special double issue on environment and tools for parallel scientific computing
ScaLAPACK user's guide

ScaLAPACK user's guide
Virtual Memory Management in Data Parallel Applications

HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Parallel Out-of-Core Matrix Inversion

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimization of the ScaLAPACK LU Factorization Routine Using Communication/Computation Overlap

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance

LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance
POOCLAPACK: Parallel Out-of-Core Linear Algebra Package

POOCLAPACK: Parallel Out-of-Core Linear Algebra Package
Issues in the design of scalable out-of-core dense symmetric indefinite factorization algorithms

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII

Adaptive paging for a multifrontal solver

Proceedings of the 18th annual international conference on Supercomputing
On the Efficacy of Computation Offloading Decision-Making Strategies

International Journal of High Performance Computing Applications
HPL performance prevision to intending system improvement

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an analytical performance model of the parallel left-right looking out-of-core LU factorization algorithm for cluster-like architectures. We show the accuracy of the performance prediction model for the ScaLAPACK library. We analyze the overhead introduced by the out-of-core part of the algorithm and we outline a limitation which was never seen before: for large problems the algorithm has a poor efficiency. This overhead is divided into an IO part and a communication part. We derive an overlapping scheme and minimum memory requirement to avoid the IO overhead. The new scheme is validated by a prototype implementation in ScaLAPACK. We show the impact of the communication overhead on two-dimensional distributions. Then we show that with similar memory requirements a second overlapping scheme may be implemented to avoid the communication overhead. If the size of the physical main memory is proportional to the matrix order (O(N) bytes), then performance of the out-of-core algorithm is similar to that of the in-core algorithm which requires O(N2) bytes. This paper demonstrates that there is no memory limitation for the factorization of huge matrices.