Sparse Cholesky factorization on a local-memory multiprocessor
SIAM Journal on Scientific and Statistical Computing
Introduction to Parallel & Vector Solution of Linear Systems
Introduction to Parallel & Vector Solution of Linear Systems
ACM Transactions on Mathematical Software (TOMS)
The influence of relaxed supernode partitions on the multifrontal method
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Limiting communication in parallel sparse Cholesky factorization
SIAM Journal on Scientific and Statistical Computing
Parallel algorithms for sparse linear systems
SIAM Review
PYRROS: static task scheduling and code generation for message passing multiprocessors
ICS '92 Proceedings of the 6th international conference on Supercomputing
A supernodal Cholesky factorization algorithm for shared-memory multiprocessors
SIAM Journal on Scientific Computing
Block sparse Cholesky algorithms on advanced uniprocessor computers
SIAM Journal on Scientific Computing
An efficient block-oriented approach to parallel sparse Cholesky factorization
SIAM Journal on Scientific Computing
Task Clustering and Scheduling for Distributed Memory Parallel Architectures
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
SIAM Journal on Scientific Computing
Highly Scalable Parallel Algorithms for Sparse Matrix Factorization
IEEE Transactions on Parallel and Distributed Systems
Run-time techniques for exploiting irregular task parallelism on distributed memory architectures
Journal of Parallel and Distributed Computing
Elimination forest guided 2D sparse LU factorization
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Space/time-efficient scheduling and execution of parallel irregular computations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Numerical Linear Algebra for High Performance Computers
Numerical Linear Algebra for High Performance Computers
Improved load distribution in parallel sparse cholesky factorization
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
On the Granularity and Clustering of Directed Acyclic Task Graphs
IEEE Transactions on Parallel and Distributed Systems
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
IEEE Transactions on Parallel and Distributed Systems
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
The performance impact of data reuse in parallel dense Cholesky factorization
The performance impact of data reuse in parallel dense Cholesky factorization
Hi-index | 0.00 |
Block-oriented sparse Cholesky factorization decomposes a sparse matrix into rectangular subblocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency and reduces the overall communication volume so that it performs more efficiently on a distributed-memory multiprocessor system than the customary column-oriented factorization method. But until now, mapping of blocks to processors has been designed for load balance with restricted communication patterns. In this paper, we represent tasks using a block dependency DAG that represents the execution behavior of block sparse Cholesky factorization in a distributed-memory system. Since the characteristics of tasks for block Cholesky factorization are different from those of the conventional parallel task model, we propose a new task scheduling algorithm using a block dependency DAG. The proposed algorithm consists of two stages: early-start clustering, and affined cluster mapping (ACM). The early-start clustering stage is used to cluster tasks while preserving the earliest start time of a task without limiting parallelism. After task clustering, the ACM stage allocates clusters to processors considering both communication cost and load balance. Experimental results on a Myrinet cluster system show that the proposed task scheduling approach outperforms other processor mapping methods.