Kinetic theory for bubbly flow I: collisionless case
SIAM Journal on Applied Mathematics
Kinetic theory for bubbly flow II: fluid dynamic limit
SIAM Journal on Applied Mathematics
The design and analysis of a cache architecture for texture mapping
Proceedings of the 24th annual international symposium on Computer architecture
Applied numerical linear algebra
Applied numerical linear algebra
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Tuning Strassen's matrix multiplication for memory efficiency
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Numerical Linear Algebra for High Performance Computers
Numerical Linear Algebra for High Performance Computers
Fast matrix multiplies using graphics hardware
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Parallel Algorithm And Architecture For Two-Step Division-Free Gaussian Elimination
ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Parallel Gaussian Elimination Using OpenMP and MPI
HPCS '02 Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications
Simulation of cloud dynamics on graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Visual simulation of ice crystal growth
Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Proceedings of the 30th annual international symposium on Computer architecture
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
ACM SIGGRAPH 2004 Papers
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Analysis and Performance Results of a Molecular Modeling Application on Merrimac
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
GPU Cluster for High Performance Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
VMM-independent graphics acceleration
Proceedings of the 3rd international conference on Virtual execution environments
Multi-Level Graph Layout on the GPU
IEEE Transactions on Visualization and Computer Graphics
Interactive control of real-time crowd navigation in virtual environment
Proceedings of the 2007 ACM symposium on Virtual reality software and technology
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Efficient gather and scatter operations on graphics processors
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Application development on hybrid systems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Visions for application development on hybrid computing systems
Parallel Computing
A game loop architecture for the GPU used as a math coprocessor in real-time applications
Computers in Entertainment (CIE) - SPECIAL ISSUE: Media Arts
Fine-grained parallelization of lattice QCD kernel routine on GPUs
Journal of Parallel and Distributed Computing
Algorithmic performance studies on graphics processing units
Journal of Parallel and Distributed Computing
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Solving Dense Linear Systems on Graphics Processors
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Solving dense linear systems on platforms with multiple hardware accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Attaining High Performance in General-Purpose Computations on Current Graphics Processors
High Performance Computing for Computational Science - VECPAR 2008
Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Solving Sparse Linear Systems on NVIDIA Tesla GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
GPU friendly fast Poisson solver for structured power grid network analysis
Proceedings of the 46th Annual Design Automation Conference
GPU based sparse grid technique for solving multidimensional options pricing PDEs
Proceedings of the 2nd Workshop on High Performance Computational Finance
Multi-core acceleration of chemical kinetics for simulation and prediction
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Triangular matrix inversion on Graphics Processing Unit
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Heterogeneous multicore parallel programming for graphics processing units
Scientific Programming - Software Development for Multi-core Computing Systems
Compiler support for general-purpose computation on GPUs
The Journal of Supercomputing
Fast genetic programming on GPUs
EuroGP'07 Proceedings of the 10th European conference on Genetic programming
Towards dense linear algebra for hybrid GPU accelerated manycore systems
Parallel Computing
GPU-based parallel householder bidiagonalization
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Performance analysis of Cooley-Tukey FFT algorithms for a many-core architecture
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
A code motion technique for accelerating general-purpose computation on the GPU
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Environmental Modelling & Software
Journal of Computational and Applied Mathematics
An introduction to GPU accelerated surgical simulation
ISBMS'06 Proceedings of the Third international conference on Biomedical Simulation
A condensation-based application of Cramer's rule for solving large-scale linear systems
Journal of Discrete Algorithms
GPU-based active contour segmentation using gradient vector flow
ISVC'06 Proceedings of the Second international conference on Advances in Visual Computing - Volume Part I
Parallel genetic algorithm on the CUDA architecture
EvoApplicatons'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part I
A GPGPU approach for accelerating 2-d/3-d rigid registration of medical images
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
GPU-accelerated finite element method for modelling light transport in diffuse optical tomography
Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
Concurrency and Computation: Practice & Experience
Journal of Parallel and Distributed Computing
New basic linear algebra methods for simulation on GPUs
Proceedings of the 2011 Grand Challenges on Modeling and Simulation Conference
A GPU-based approximate SVD algorithm
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
A fast implementation of the octagon abstract domain on graphics hardware
SAS'07 Proceedings of the 14th international conference on Static Analysis
Concurrent number cruncher: an efficient sparse linear solver on the GPU
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
GPU-based parallel vertex substitution algorithm for the p-median problem
Computers and Industrial Engineering
WOW: what the world of (data) warehousing can learn from the World of Warcraft
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.01 |
We present a novel algorithm to solve dense linear systems using graphics processors (GPUs). We reduce matrix decomposition and row operations to a series of rasterization problems on the GPU. These include new techniques for streaming index pairs, swapping rows and columns and parallelizing the computation to utilize multiple vertex and fragment processors. We also use appropriate data representations to match the rasterization order and cache technology of graphics processors. We have implemented our algorithm on different GPUs and compared the performance with optimized CPU implementations. In particular, our implementation on a NVIDIA GeForce 7800 GPU outperforms a CPU-based ATLAS implementation. Moreover, our results show that our algorithm is cache and bandwidth efficient and scales well with the number of fragment processors within the GPU and the core GPU clock rate. We use our algorithm for fluid flow simulation and demonstrate that the commodity GPU is a useful co-processor for many scientific applications.