Parallel implementation of conjugate gradient method on graphics processors

Authors:
Marcin Wozniak;Tomasz Olas;Roman Wyrzykowski
Affiliations:
Institute of Computational and Information Sciences, Czestochowa University of Technology, Czestochowa, Poland;Institute of Computational and Information Sciences, Czestochowa University of Technology, Czestochowa, Poland;Institute of Computational and Information Sciences, Czestochowa University of Technology, Czestochowa, Poland
Venue:
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Year:
2009

Citing 9
Cited 1

Preconditioned CG methods for sparse matrices on massively parallel machines

Parallel Computing
FEM Computations on Clusters Using Different Models of Parallel Programming

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
A Technique for Mapping Sparse Matrix Computations into Regular Processor Arrays

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Automatic performance tuning of sparse matrix kernels

Automatic performance tuning of sparse matrix kernels
A New Approach for Accelerating the Sparse Matrix-Vector Multiplication

SYNASC '06 Proceedings of the Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Parallel Computing
A new diagonal blocking format and model of cache behavior for sparse matrices

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics

Towards efficient execution of erasure codes on multicore architectures

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays GPUs become extremely promising multi/manycore architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, mainly because of necessity of deep adaptation of implemented algorithms to a target architecture. In this work, we propose how to perform such an adaptation to achieve an efficient parallel implementation of the conjugate gradient (CG) algorithm, which is widely used for solving large sparse linear systems of equations, arising e.g. in FEM problems. Aiming at efficient implementation of the main operation of the CG algorithm, which is sparse matrix-vector multiplication (SpMV ), different techniques of optimizing access to the hierarchical memory of GPUs are proposed and studied. The experimental investigation of a proposed CUDA-based implementation of the CG algorithm is carried out on two GPU architectures: GeForce 8800 and Tesla C1060. It has been shown that optimization of access to GPU memory allows us to reduce considerably the execution time of the SpMV operation, and consequently to achieve a significant speedup over CPUs when implementing the whole CG algorithm.