Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Program partitioning for NUMA multiprocessor computer systems
Journal of Parallel and Distributed Computing - Special issue on performance of supercomputers
List scheduling with and without communication delays
Parallel Computing
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably efficient scheduling for languages with fine-grained parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Modeling the benefits of mixed data and task parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Decoupling synchronization and data transfer in message passing systems of parallel computers
ICS '95 Proceedings of the 9th international conference on Supercomputing
Run-time compilation for parallel sparse matrix computations
ICS '96 Proceedings of the 10th international conference on Supercomputing
Run-time techniques for exploiting irregular task parallelism on distributed memory architectures
Journal of Parallel and Distributed Computing
Sparse LU factorization with partial pivoting on distributed memory machines
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Parallel Programming and Compilers
Parallel Programming and Compilers
Improved load distribution in parallel sparse cholesky factorization
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Automatic Extraction of Functional Parallelism from Ordinary Programs
IEEE Transactions on Parallel and Distributed Systems
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
IEEE Transactions on Parallel and Distributed Systems
Sparse gaussian elimination on high-performance computers
Sparse gaussian elimination on high-performance computers
Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures
IEEE Transactions on Parallel and Distributed Systems
Elimination forest guided 2D sparse LU factorization
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Low Memory Cost Dynamic Scheduling of Large Coarse Grain Task Graphs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Hi-index | 0.00 |
Solving problems of large sizes is an important goal for parallel machines with multiple CPU and memory resources. In this paper, issues of efficient execution of overhead-sensitive parallel irregular computation under memory constraints are addressed. The irregular parallelism is modeled by task dependence graphs with mixed granularities. The trade-off in achieving both time and space efficiency is investigated. The main difficulty of designing efficient run-time system support is caused by the use of fast communication primitives available on modern parallel architectures. A run-time active memory management scheme and new scheduling techniques are proposed to improve memory utilization while retaining good time efficiency, and a theoretical analysis on correctness and performance is provided. This work is implemented in the context of RAPID system [5] which provides run-time support for parallelizing irregular code on distributed memory machines and the effectiveness of the proposed techniques is verified on sparse Cholesky and LU factorization with partial pivoting. The experimental results on Cray-T3D show that solvable problem sizes can be increased substantially under limited memory capacities and the loss of execution efficiency caused by the extra memory managing overhead is reasonable.