LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware

Authors:
Nico Galoppo;Naga K. Govindaraju;Michael Henson;Dinesh Manocha
Affiliations:
University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill
Venue:
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Year:
2005

Citing 24
Cited 49

Kinetic theory for bubbly flow I: collisionless case

SIAM Journal on Applied Mathematics
Kinetic theory for bubbly flow II: fluid dynamic limit

SIAM Journal on Applied Mathematics
The design and analysis of a cache architecture for texture mapping

Proceedings of the 24th annual international symposium on Computer architecture
Applied numerical linear algebra

Applied numerical linear algebra
Recursive array layouts and fast parallel matrix multiplication

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Tuning Strassen's matrix multiplication for memory efficiency

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Numerical Linear Algebra for High Performance Computers

Numerical Linear Algebra for High Performance Computers
Fast matrix multiplies using graphics hardware

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Parallel Algorithm And Architecture For Two-Step Division-Free Gaussian Elimination

ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Parallel Gaussian Elimination Using OpenMP and MPI

HPCS '02 Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications
Simulation of cloud dynamics on graphics hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Visual simulation of ice crystal growth

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Proceedings of the 30th annual international symposium on Computer architecture
Linear algebra operators for GPU implementation of numerical algorithms

ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Evaluating the Imagine Stream Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Shader algebra

ACM SIGGRAPH 2004 Papers
Scientific Computations on Modern Parallel Vector Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Analysis and Performance Results of a Molecular Modeling Application on Merrimac

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware

CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A memory model for scientific algorithms on graphics processors

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
VMM-independent graphics acceleration

Proceedings of the 3rd international conference on Virtual execution environments
Multi-Level Graph Layout on the GPU

IEEE Transactions on Visualization and Computer Graphics
Interactive control of real-time crowd navigation in virtual environment

Proceedings of the 2007 ACM symposium on Virtual reality software and technology
CellSs: making it easier to program the cell broadband engine processor

IBM Journal of Research and Development
Efficient gather and scatter operations on graphics processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Application development on hybrid systems

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Visions for application development on hybrid computing systems

Parallel Computing
A game loop architecture for the GPU used as a math coprocessor in real-time applications

Computers in Entertainment (CIE) - SPECIAL ISSUE: Media Arts
Fine-grained parallelization of lattice QCD kernel routine on GPUs

Journal of Parallel and Distributed Computing
Algorithmic performance studies on graphics processing units

Journal of Parallel and Distributed Computing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Solving Dense Linear Systems on Graphics Processors

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Solving dense linear systems on platforms with multiple hardware accelerators

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Attaining High Performance in General-Purpose Computations on Current Graphics Processors

High Performance Computing for Computational Science - VECPAR 2008
Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study

Parallel Computing
Solving Sparse Linear Systems on NVIDIA Tesla GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
GPU friendly fast Poisson solver for structured power grid network analysis

Proceedings of the 46th Annual Design Automation Conference
GPU based sparse grid technique for solving multidimensional options pricing PDEs

Proceedings of the 2nd Workshop on High Performance Computational Finance
Multi-core acceleration of chemical kinetics for simulation and prediction

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Triangular matrix inversion on Graphics Processing Unit

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Heterogeneous multicore parallel programming for graphics processing units

Scientific Programming - Software Development for Multi-core Computing Systems
Compiler support for general-purpose computation on GPUs

The Journal of Supercomputing
Fast genetic programming on GPUs

EuroGP'07 Proceedings of the 10th European conference on Genetic programming
GPU computing with Kaczmarz's and other iterative algorithms for linear systems

Parallel Computing
Towards dense linear algebra for hybrid GPU accelerated manycore systems

Parallel Computing
GPU-based parallel householder bidiagonalization

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

Parallel Computing
Performance analysis of Cooley-Tukey FFT algorithms for a many-core architecture

SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
A code motion technique for accelerating general-purpose computation on the GPU

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Accelerating urban fast response Lagrangian dispersion simulations using inexpensive graphics processor parallelism

Environmental Modelling & Software
Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMP

Journal of Computational and Applied Mathematics
An introduction to GPU accelerated surgical simulation

ISBMS'06 Proceedings of the Third international conference on Biomedical Simulation
A condensation-based application of Cramer's rule for solving large-scale linear systems

Journal of Discrete Algorithms
GPU-based active contour segmentation using gradient vector flow

ISVC'06 Proceedings of the Second international conference on Advances in Visual Computing - Volume Part I
Parallel genetic algorithm on the CUDA architecture

EvoApplicatons'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part I
A GPGPU approach for accelerating 2-d/3-d rigid registration of medical images

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
GPU-accelerated finite element method for modelling light transport in diffuse optical tomography

Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
Gibraltar: A Reed-Solomon coding library for storage applications on programmable graphics processors

Concurrency and Computation: Practice & Experience
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

Parallel Computing
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations

Journal of Parallel and Distributed Computing
New basic linear algebra methods for simulation on GPUs

Proceedings of the 2011 Grand Challenges on Modeling and Simulation Conference
A GPU-based approximate SVD algorithm

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
A fast implementation of the octagon abstract domain on graphics hardware

SAS'07 Proceedings of the 14th international conference on Static Analysis
Concurrent number cruncher: an efficient sparse linear solver on the GPU

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
GPU-based parallel vertex substitution algorithm for the p-median problem

Computers and Industrial Engineering
WOW: what the world of (data) warehousing can learn from the World of Warcraft

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a novel algorithm to solve dense linear systems using graphics processors (GPUs). We reduce matrix decomposition and row operations to a series of rasterization problems on the GPU. These include new techniques for streaming index pairs, swapping rows and columns and parallelizing the computation to utilize multiple vertex and fragment processors. We also use appropriate data representations to match the rasterization order and cache technology of graphics processors. We have implemented our algorithm on different GPUs and compared the performance with optimized CPU implementations. In particular, our implementation on a NVIDIA GeForce 7800 GPU outperforms a CPU-based ATLAS implementation. Moreover, our results show that our algorithm is cache and bandwidth efficient and scales well with the number of fragment processors within the GPU and the core GPU clock rate. We use our algorithm for fluid flow simulation and demonstrate that the commodity GPU is a useful co-processor for many scientific applications.