Controlling chaos: on safe side-effects in data-parallel operations
Proceedings of the 4th workshop on Declarative aspects of multicore programming
Experiences with Mapping Non-linear Memory Access Patterns into GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Real-Time GPU-Based Voxel Carving with Systematic Occlusion Handling
Proceedings of the 31st DAGM Symposium on Pattern Recognition
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
IEEE Transactions on Circuits and Systems for Video Technology
Understanding throughput-oriented architectures
Communications of the ACM
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Journal of Real-Time Image Processing
Parallel processing with CUDA in ceramic tiles classification
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Optimizing memory access on GPUs using morton order indexing
Proceedings of the 48th Annual Southeast Regional Conference
High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Assessment of GPU computational enhancement to a 2D flood model
Environmental Modelling & Software
Advances in Engineering Software
Operating systems must support GPU abstractions
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Granular representation of temporal signals using differential quadratures
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
Parallel multivariate slice sampling
Statistics and Computing
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
GPU accelerated CAE using open solvers and the cloud
ACM SIGARCH Computer Architecture News
A GPU-based high-throughput image retrieval algorithm
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Implementing p systems parallelism by means of GPUs
WMC'09 Proceedings of the 10th international conference on Membrane Computing
Towards user transparent parallel multimedia computing on GPU-Clusters
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Parallelization of pagerank on multicore processors
ICDCIT'12 Proceedings of the 8th international conference on Distributed Computing and Internet Technology
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
On the correctness of the SIMT execution model of GPUs
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Direct approaches to exploit many-core architecture in bioinformatics
Future Generation Computer Systems
Three-dimensional thinning algorithms on graphics processing units and multicore CPUs
Concurrency and Computation: Practice & Experience
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enhancing data parallelism for Ant Colony Optimization on GPUs
Journal of Parallel and Distributed Computing
Spill code placement for SIMD machines
SBLP'12 Proceedings of the 16th Brazilian conference on Programming Languages
Parallel partitioning for distributed systems using sequential assignment
Journal of Parallel and Distributed Computing
Data layout optimization for GPGPU architectures
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Enhancing GPU parallelism in nature-inspired algorithms
The Journal of Supercomputing
Parallel multi-objective Ant Programming for classification using GPUs
Journal of Parallel and Distributed Computing
Real-time recovery of moving 3D faces for emerging applications
Computers in Industry
High level transforms for SIMD and low-level computer vision algorithms
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Population-based harmony search using GPU applied to protein structure prediction
International Journal of Computational Science and Engineering
High performance evaluation of evolutionary-mined association rules on GPUs
The Journal of Supercomputing
Hi-index | 0.02 |
The CUDA programming model provides a straightforward means of describing inherently parallel computations, and NVIDIA's Tesla GPU architecture delivers high computational throughput on massively parallel problems. This article surveys experiences gained in applying CUDA to a diverse set of problems and the parallel speedups over sequential codes running on traditional CPU architectures attained by executing key computations on the GPU.