Block sparse Cholesky algorithms on advanced uniprocessor computers
SIAM Journal on Scientific Computing
An efficient block-oriented approach to parallel sparse Cholesky factorization
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The network architecture of the connection machine CM-5
Journal of Parallel and Distributed Computing
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling
SIAM Journal on Matrix Analysis and Applications
A parallel formulation of interior point algorithms
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Solving Real-World Linear Programs: A Decade and More of Progress
Operations Research
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
ACM Transactions on Mathematical Software (TOMS)
Active Memory Techniques for ccNUMA Multiprocessors
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Sparse gaussian elimination on high-performance computers
Sparse gaussian elimination on high-performance computers
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
Gigaflops in linear programming
Operations Research Letters
Hi-index | 0.00 |
In this paper we describe parallelization of interior-point method (IPM) aimed at achieving high scalability on large-scale chip-multiprocessors (CMPs). IPM is an important computational technique used to solve optimization problems in many areas of science, engineering and finance. IPM spends most of its computation time in a few sparse linear algebra kernels. While each of these kernels contains a large amount of parallelism, sparse irregular datasets seen in many optimization problems make parallelism difficult to exploit. As a result, most researchers have shown only a relatively low scalability of 4X-12X on medium to large scale parallel machines. This paper proposes and evaluates several algorithmic and hardware features to improve IPM parallel performance on large-scale CMPs. Through detailed simulations, we demonstrate how exploring multiple levels of parallelism with hardware support for low overhead task queues and parallel reduction enables IPM to achieve up to 48X parallel speedup on a 64-core CMP.