Solving sparse triangular linear systems on parallel computers
International Journal of High Speed Computing
Vector models for data-parallel computing
Vector models for data-parallel computing
Rapid, stable fluid dynamics for computer graphics
SIGGRAPH '90 Proceedings of the 17th annual conference on Computer graphics and interactive techniques
Scan primitives for vector computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fast matrix multiplies using graphics hardware
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
A programming language
Glift: Generic, efficient, random-access GPU data structures
ACM Transactions on Graphics (TOG)
GPUTeraSort: high performance graphics co-processor sorting for large database management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A performance-oriented data parallel virtual machine for GPUs
ACM SIGGRAPH 2006 Sketches
Resolution-matched shadow maps
ACM Transactions on Graphics (TOG)
Scout: a data-parallel programming language for graphics processors
Parallel Computing
BSGP: bulk-synchronous GPU programming
ACM SIGGRAPH 2008 papers
Efficient gather and scatter operations on graphics processors
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Performing efficient NURBS modeling operations on the GPU
Proceedings of the 2008 ACM symposium on Solid and physical modeling
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Fast scan algorithms on graphics processors
Proceedings of the 22nd annual international conference on Supercomputing
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sparse matrix computations on manycore GPU's
Proceedings of the 45th annual Design Automation Conference
Data parallel execution challenges and runtime performance of agent simulations on GPUs
Proceedings of the 2008 Spring simulation multiconference
Scalable parallel programming with CUDA
ACM SIGGRAPH 2008 classes
Real-time KD-tree construction on graphics hardware
ACM SIGGRAPH Asia 2008 papers
Real-time Reyes-style adaptive surface subdivision
ACM SIGGRAPH Asia 2008 papers
Algorithmic performance studies on graphics processing units
Journal of Parallel and Distributed Computing
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Fast parallel GPU-sorting using a hybrid algorithm
Journal of Parallel and Distributed Computing
All-pairs shortest-paths for large graphs on the GPU
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Performance Evaluation of the NVIDIA GeForce 8800 GTX GPU for Machine Learning
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
A Practical Quicksort Algorithm for Graphics Processors
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Efficient implementation of sorting on multi-core SIMD CPU architecture
Proceedings of the VLDB Endowment
Rapid Multipole Graph Drawing on the GPU
Graph Drawing
Fast high-quality line visibility
Proceedings of the 2009 symposium on Interactive 3D graphics and games
Real-time view-dependent rendering of parametric surfaces
Proceedings of the 2009 symposium on Interactive 3D graphics and games
Fast and scalable list ranking on the GPU
Proceedings of the 23rd international conference on Supercomputing
On sorting and load balancing on GPUs
ACM SIGARCH Computer Architecture News
Solving Sparse Linear Systems on NVIDIA Tesla GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
GPU-Quicksort: A practical Quicksort algorithm for graphics processors
Journal of Experimental Algorithmics (JEA)
Frequent itemset mining on graphics processors
Proceedings of the Fifth International Workshop on Data Management on New Hardware
A parallel algorithm for construction of uniform grids
Proceedings of the Conference on High Performance Graphics 2009
Parallel view-dependent tessellation of Catmull-Clark subdivision surfaces
Proceedings of the Conference on High Performance Graphics 2009
Efficient stream compaction on wide SIMD many-core architectures
Proceedings of the Conference on High Performance Graphics 2009
Fast minimum spanning tree for large graphs on the GPU
Proceedings of the Conference on High Performance Graphics 2009
Stream compaction for deferred shading
Proceedings of the Conference on High Performance Graphics 2009
Real-time parallel hashing on the GPU
ACM SIGGRAPH Asia 2009 papers
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
Accelerating geometric queries using the GPU
2009 SIAM/ACM Joint Conference on Geometric and Physical Modeling
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Efficient band approximation of Gram matrices for large scale kernel methods on GPUs
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects
Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games
Parallel Banding Algorithm to compute exact distance transform with the GPU
Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Accelerating MATLAB Image Processing Toolbox functions on GPUs
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors
Journal of Computational Physics
State-of-the-art in heterogeneous computing
Scientific Programming
Solving path problems on the GPU
Parallel Computing
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
From Sparse Matrix to Optimal GPU CUDA Sparse Matrix Vector Product Implementation
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Technical Section: Parallel generation of multiple L-systems
Computers and Graphics
Real-time collision culling of a million bodies on graphics processing units
ACM SIGGRAPH Asia 2010 papers
Parallel implementation of conjugate gradient method on graphics processors
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Fast in-place sorting with CUDA based on bitonic sort
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
A fast GPU implementation for solving sparse ill-posed linear equation systems
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
GPU-WAH: applying GPUs to compressing bitmap indexes with word aligned hybrid
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry
Proceedings of the Conference on High Performance Graphics
A work-efficient GPU algorithm for level set segmentation
Proceedings of the Conference on High Performance Graphics
Accelerating Haskell array codes with multicore GPUs
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Simple optimizations for an applicative array language for graphics processors
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Proceedings of the 14th International Conference on Extending Database Technology
Analysis of Parallel Algorithms for Energy Conservation with GPU
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Register packing for cyclic reduction: a case study
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Efficient maximal poisson-disk sampling
ACM SIGGRAPH 2011 papers
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
Facing the multicore-challenge
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
Facing the multicore-challenge
Parallel programming with inductive synthesis
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Proceedings of the VLDB Endowment
Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
SAH KD-tree construction on GPU
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
Robust real-time deformation of incompressible surface meshes
SCA '11 Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
GPU-efficient recursive filtering and summed-area tables
Proceedings of the 2011 SIGGRAPH Asia Conference
Journal of Computational and Applied Mathematics
MOLAP cube based on parallel scan algorithm
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
GPU-accelerated Hausdorff distance computation between dynamic deformable NURBS surfaces
Computer-Aided Design
High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Two-Way Real Time Fluid Simulation Using a Heterogeneous Multicore CPU and GPU Architecture
PADS '11 Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation
Dymaxion: optimizing memory access patterns for heterogeneous systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Fast GPU-based locality sensitive hashing for k-nearest neighbor computation
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Scalable parallel minimum spanning forest computation
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs
Microprocessors & Microsystems
Many-Core architecture oriented parallel algorithm design for computer animation
MIG'11 Proceedings of the 4th international conference on Motion in Games
Continuous deformations by isometry preserving shape integration
Proceedings of the 7th international conference on Curves and Surfaces
Smoldyn on Graphics Processing Units: Massively Parallel Brownian Dynamics Simulations
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Scan detection and parallelization in "inherently sequential" nested loop programs
Proceedings of the Tenth International Symposium on Code Generation and Optimization
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
International Journal of High Performance Computing Applications
Sorting on GPUs for large scale datasets: A thorough comparison
Information Processing and Management: an International Journal
Constructing natural neighbor interpolation based grid DEM using CUDA
Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications
Discrete range searching primitive for the GPU and its applications
Journal of Experimental Algorithmics (JEA)
Parallel algorithm for landform attributes representation on multicore and Multi-GPU systems
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
CUDASA: compute unified device and systems architecture
EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Ray tracing dynamic scenes with shadows on GPU
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Parallel view-dependent refinement of compact progressive meshes
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
A scalable, numerically stable, high-performance tridiagonal solver using GPUs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Dependency-Free Parallel Progressive Meshes
Computer Graphics Forum
Efficient data management for incoherent ray tracing
Applied Soft Computing
RDFS reasoning on massively parallel hardware
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
An effective and efficient parallel approach for random graph generation over GPUs
Journal of Parallel and Distributed Computing
GPU-accelerated preconditioned iterative linear solvers
The Journal of Supercomputing
GPU accelerated likelihoods for stereo-based articulated tracking
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part II
From multiple views to textured 3d meshes: a GPU-Powered approach
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part II
Accelerating visual categorization with the GPU
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part II
Parallel Shellsort Algorithm for Many-Core GPUs with CUDA
International Journal of Grid and High Performance Computing
Data-only flattening for nested data parallelism
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Energy cost evaluation of parallel algorithms for multiprocessor systems
Cluster Computing
Fast poisson solvers for graphics processing units
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Optimising purely functional GPU programs
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
A micro 64-tree structure for accelerating ray tracing on a GPU
Proceedings of Graphics Interface 2013
A sound and complete abstraction for reasoning about parallel prefix sums
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Hardware-oblivious parallelism for in-memory column-stores
Proceedings of the VLDB Endowment
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library
Journal of Parallel and Distributed Computing
Data-parallel finite-state machines
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
yaSpMV: yet another SpMV framework on GPUs
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.01 |
The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of segmented scan, on NVIDIA GPUs using the CUDA API. Using the scan primitives, we show novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyze the performance of the scan primitives, several sort algorithms that use the scan primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.