Efficient Parallel Nonnegative Least Squares on Multicore Architectures

Authors:
Yuancheng Luo;Ramani Duraiswami
Affiliations:
yluo1@umd.edu and ramani@umiacs.umd.edu;-
Venue:
SIAM Journal on Scientific Computing
Year:
2011

Citing 5
Cited 2

QR factorization of toeplitz matrices

Numerische Mathematik
FFT-based preconditioners for Toeplitz-block least squares problems

SIAM Journal on Numerical Analysis
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
QR decomposition on GPUs

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units

Displacement interpolation using Lagrangian mass transport

Proceedings of the 2011 SIGGRAPH Asia Conference
Subspace fluid re-simulation

ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings

Quantified Score

Hi-index	0.00

Visualization

Abstract

We parallelize a version of the active-set iterative algorithm derived from the original works of Lawson and Hanson [Solving Least Squares Problems, Prentice-Hall, 1974] on multicore architectures. This algorithm requires the solution of an unconstrained least squares problem in every step of the iteration for a matrix composed of the passive columns of the original system matrix. To achieve improved performance, we use parallelizable procedures to efficiently update and downdate the $QR$ factorization of the matrix at each iteration, to account for inserted and removed columns. We use a reordering strategy of the columns in the decomposition to reduce computation and memory access costs. We consider graphics processing units (GPUs) as a new mode for efficient parallel computations and compare our implementations to that of multicore CPUs. Both synthetic and nonsynthetic data are used in the experiments.