Parallel implementation of multifrontal schemes
Parallel Computing
Solving planar systems of equations on distributed-memory multiprocessors
Solving planar systems of equations on distributed-memory multiprocessors
ACM Transactions on Mathematical Software (TOMS)
A fan-in algorithm for distributed sparse numerical factorization
SIAM Journal on Scientific and Statistical Computing
The role of elimination trees in sparse factorization
SIAM Journal on Matrix Analysis and Applications
Task scheduling for parallel sparse Cholesky factorization
International Journal of Parallel Programming
ACM Transactions on Mathematical Software (TOMS)
Squeezing the most out of an algorithm in CRAY FORTRAN
ACM Transactions on Mathematical Software (TOMS)
The Multifrontal Solution of Indefinite Sparse Symmetric Linear
ACM Transactions on Mathematical Software (TOMS)
Computer Solution of Large Sparse Positive Definite
Computer Solution of Large Sparse Positive Definite
A comparative evaluation of nodal and supernodal parallel sparse matrix factorization: detailed simulation results
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Coarse-grain parallel programming in Jade
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance debugging shared memory multiprocessor programs with MTOOL
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Characterizing the behavior of sparse algorithms on caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Data locality and load balancing in COOL
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance evaluation of hybrid hardware and software distributed shared memory protocols
ICS '94 Proceedings of the 8th international conference on Supercomputing
BOS is boss: a case for bulk-synchronous object systems
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
Algorithmic performance studies on graphics processing units
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In this paper we study the problem of factoring large sparse systems of equations on high-performance multiprocessor workstations. While these multiprocessor workstations are capable of very high peak floating point computation rates, most existing sparse factorization codes achieve only a small fraction of this potential. A major limiting factor is the cost of performing memory accesses. In this paper, we describe a parallel factorization code which utilizes the supernodal structure of the matrix to substantially reduce the number of memory references. We also propose enhancements that significantly reduce the overall cache miss rate. The result is greatly increased factorization performance. We present experimental results from executions on the Silicon Graphics 4D/380 multiprocessor. Using eight processors, the parallel supernodal code achieves a computation rate of approximately 40 MFLOPS when factoring a range of benchmark matrices. This is more than twice as fast as previously used parallel nodal approaches.