Simulation of cloud dynamics on graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Fast computation of database operations using graphics processors
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Implicitly parallel programming models for thousand-core microprocessors
Proceedings of the 44th annual Design Automation Conference
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Parallel sorting on ILLIAC array processor
ISTASC'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Systems Theory and Scientific Computation - Volume 7
Scalable Parallel Programming with CUDA
Queue - GPU Computing
GPU acceleration of cutoff pair potentials for molecular modeling applications
Proceedings of the 5th conference on Computing frontiers
Hotspot: acompact thermal modeling methodology for early-stage VLSI design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Speckle reducing anisotropic diffusion
IEEE Transactions on Image Processing
Clustering billions of data points using GPUs
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Accelerating total variation regularization for matrix-valued images on GPUs
Proceedings of the 6th ACM conference on Computing frontiers
Using common graphics hardware for multi-agent traffic simulation with CUDA
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Frequent itemset mining on graphics processors
Proceedings of the Fifth International Workshop on Data Management on New Hardware
On GPU's viability as a middleware accelerator
Cluster Computing
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Complexity effective memory access scheduling for many-core accelerator architectures
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Swarm's flight: accelerating the particles using C-CUDA
CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Fast Pattern Classification of Ventricular Arrhythmias Using Graphics Processing Units
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Accelerating SQL database operations on a GPU with CUDA
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
GPU implementation of the multiple back-propagation algorithm
IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Proceedings of the Conference on Design, Automation and Test in Europe
Journal of Real-Time Image Processing
Non-negative matrix factorization implementation using graphic processing units
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Parallel processing with CUDA in ceramic tiles classification
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Optimizing memory access on GPUs using morton order indexing
Proceedings of the 48th Annual Southeast Regional Conference
Data-intensive document clustering on graphics processing unit (GPU) clusters
Journal of Parallel and Distributed Computing
Database compression on graphics processors
Proceedings of the VLDB Endowment
Journal of Computational Physics
Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
A new method for GPU based irregular reductions and its application to k-means clustering
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Floating-point data compression at 75 Gb/s on a GPU
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Advances in Engineering Software
Simulation of bevel gear cutting with GPGPUs--performance and productivity
Computer Science - Research and Development
On the GPGPU parallelization issues of finite element approximate inverse preconditioning
Journal of Computational and Applied Mathematics
Dymaxion: optimizing memory access patterns for heterogeneous systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Comparing Parallel Simulation of Social Agents Using Cilk and OpenCL
DS-RT '11 Proceedings of the 2011 IEEE/ACM 15th International Symposium on Distributed Simulation and Real Time Applications
Image and video processing on CUDA: state of the art and future directions
MACMESE'11 Proceedings of the 13th WSEAS international conference on Mathematical and computational methods in science and engineering
Exploring the limits of GPGPU scheduling in control flow bound applications
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Exploration of CPU/GPU co-execution: from the perspective of performance, energy, and temperature
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Advances in Engineering Software
GPU-based parallel collision detection for fast motion planning
International Journal of Robotics Research
A GPU implementation of inclusion-based points-to analysis
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Safe and familiar multi-core programming by means of a hybrid functional and imperative language
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
CPU/GPU computing for long-wave radiation physics on large GPU clusters
Computers & Geosciences
EvoCOP'10 Proceedings of the 10th European conference on Evolutionary Computation in Combinatorial Optimization
Expert Systems with Applications: An International Journal
A framework for GPU accelerated deformable object modeling
International Journal of High Performance Computing Applications
Efficient acquisition and clustering of local histograms for representing voxel neighborhoods
VG'10 Proceedings of the 8th IEEE/EG international conference on Volume Graphics
Three-dimensional thinning algorithms on graphics processing units and multicore CPUs
Concurrency and Computation: Practice & Experience
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Dataflow-driven GPU performance projection for multi-kernel transformations
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel solution of the subset-sum problem: an empirical study
Concurrency and Computation: Practice & Experience
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Speeding up k-Means algorithm by GPUs
Journal of Computer and System Sciences
Performance evaluation of OpenMP and CUDA on multicore systems
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Parallel approaches to machine learning-A comprehensive survey
Journal of Parallel and Distributed Computing
Optimizing Techniques for OpenCL Programs on Heterogeneous Platforms
International Journal of Grid and High Performance Computing
Optimizing tensor contraction expressions for hybrid CPU-GPU execution
Cluster Computing
Performance characterization of data-intensive kernels on AMD Fusion architectures
Computer Science - Research and Development
Parallel multi-objective Ant Programming for classification using GPUs
Journal of Parallel and Distributed Computing
Parallel multi-dimensional range query processing with R-trees on GPU
Journal of Parallel and Distributed Computing
The Journal of Supercomputing
A GPU implementation of a structural-similarity-based aerial-image classification
The Journal of Supercomputing
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
The Journal of Supercomputing
Assessing the performance of OpenMP programs on the intel xeon phi
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Proceedings of Programming Models and Applications on Multicores and Manycores
A memory access model for highly-threaded many-core architectures
Future Generation Computer Systems
Dynamic load balancing on heterogeneous multi-GPU systems
Computers and Electrical Engineering
Optimising space exploration of OpenCL for GPGPUs
International Journal of Computational Science and Engineering
Population-based harmony search using GPU applied to protein structure prediction
International Journal of Computational Science and Engineering
Implementation of LTE system on an SDR platform using CUDA and UHD
Analog Integrated Circuits and Signal Processing
Accelerating FCM neural network classifier using graphics processing units with CUDA
Applied Intelligence
Integrated Computer-Aided Engineering
Hi-index | 0.01 |
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of general-purpose applications compared to contemporary general-purpose processors (CPUs). This paper uses NVIDIA's C-like CUDA language and an engineering sample of their recently introduced GTX 260 GPU to explore the effectiveness of GPUs for a variety of application types, and describes some specific coding idioms that improve their performance on the GPU. GPU performance is compared to both single-core and multicore CPU performance, with multicore CPU implementations written using OpenMP. The paper also discusses advantages and inefficiencies of the CUDA programming model and some desirable features that might allow for greater ease of use and also more readily support a larger body of applications.