Parallel algorithms for sparse linear systems
SIAM Review
A new parallel architecture for sparse matrix computation based on finite projective geometries
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
PYRROS: static task scheduling and code generation for message passing multiprocessors
ICS '92 Proceedings of the 6th international conference on Supercomputing
Performance of distributed sparse Cholesky factorization with pre-scheduling
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization
Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization
Techniques to overlap computation and communication in irregular iterative applications
ICS '94 Proceedings of the 8th international conference on Supercomputing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Remote queues: exposing message queues for optimization and atomicity
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Parallel Programming and Compilers
Parallel Programming and Compilers
Improved load distribution in parallel sparse cholesky factorization
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
On the Granularity and Clustering of Directed Acyclic Task Graphs
IEEE Transactions on Parallel and Distributed Systems
Experience with active messages on the Meiko CS-2
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures
Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures
Run-time compilation for parallel sparse matrix computations
ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures
IEEE Transactions on Parallel and Distributed Systems
Sparse LU factorization with partial pivoting on distributed memory machines
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Journal of Real-Time Image Processing
Hi-index | 0.00 |
Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient run-time system for executing general asynchronous DAG schedules on distributed memory machines. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying, and takes advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse LU and Cholesky factorizations for which actual speedups have been hard to obtain in the literature because parallelism in these problems is irregular and limited. Our experiments on Meiko CS-2 show the promising results of our approach in exploiting irregular task parallelism with mixed granularities.