A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Block sparse Cholesky algorithms on advanced uniprocessor computers
SIAM Journal on Scientific Computing
Augmented Lagrangian--SQP Methods for Nonlinear OptimalControl Problems of Tracking Type
SIAM Journal on Control and Optimization
SIAM Journal on Scientific Computing
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling
SIAM Journal on Matrix Analysis and Applications
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Solving unsymmetric sparse systems of linear equations with PARDISO
Future Generation Computer Systems - Special issue: Selected numerical algorithms
ACM Transactions on Mathematical Software (TOMS)
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Mathematical Programming: Series A and B
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
International Journal of Parallel, Emergent and Distributed Systems
Journal of Real-Time Image Processing
GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia
IEEE Transactions on Information Technology in Biomedicine
Enabling Energy-Efficient Analysis of Massive Neural Signals Using GPGPU
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Journal of Parallel and Distributed Computing
Advances in Engineering Software
Expert Systems with Applications: An International Journal
Towards energy-efficient parallel analysis of neural signals
Cluster Computing
Accelerating universal Kriging interpolation algorithm using CUDA-enabled GPU
Computers & Geosciences
Computers & Mathematics with Applications
Hi-index | 0.00 |
We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floating-point co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear interior-point optimization. Since a full re-implementation of these complex kernels is typically not feasible, we identify the matrix-matrix multiplication as a first natural entry-point for a minimally invasive integration of GPUs. We investigate the performance on the NVIDIA GeForce 8800 multicore chip initially architectured for intensive gaming applications. We exploit the architectural features of the GeForce 8800 GPU to design an efficient GPU-parallel sparse matrix solver. A prototype approach to leverage the bandwidth and computing power of GPUs for these matrix kernel operation is demonstrated resulting in an overall performance of over 110 GFlops/s on the desktop for large matrices and over 38 GFlops/s for sparse matrices arising in real applications. We use our GPU algorithm for PDE-constrained optimization problems and demonstrate that the commodity GPU is a useful co-processor for scientific applications.