The GPU Computing Era

Authors:
John Nickolls;William J. Dally
Affiliations:
NVIDIA;NVIDIA
Venue:
IEEE Micro
Year:
2010

Citing 0
Cited 42

Accelerating CUDA graph algorithms at maximum warp

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Collision-streams: fast GPU-based collision detection for deformable models

I3D '11 Symposium on Interactive 3D Graphics and Games
Unstructured grid applications on GPU: performance analysis and improvement

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Improving programmability of heterogeneous many-core systems via explicit platform descriptions

Proceedings of the 4th International Workshop on Multicore Software Engineering
A free-viewpoint virtual mirror with marker-less user interaction

SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
Solving a kind of boundary-value problem for ordinary differential equations using Fermi-The next generation CUDA computing architecture

Journal of Computational and Applied Mathematics
Iterative sparse Matrix-Vector multiplication for integer factorization on GPUs

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Accelerating the Explicitly Restarted Arnoldi Method with GPUs Using an Autotuned Matrix Vector Product

SIAM Journal on Scientific Computing
Identifying hotspots in a program for data parallel architecture: an early experience

Proceedings of the 5th India Software Engineering Conference
Better speedups using simpler parallel programming for graph connectivity and biconnectivity

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Scalable framework for mapping streaming applications onto multi-GPU systems

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Cross teaching parallelism and ray tracing: a project-based approach to teaching applied parallel computing

Proceedings of the 43rd ACM technical symposium on Computer Science Education
Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Why on-chip cache coherence is here to stay

Communications of the ACM
On the correctness of the SIMT execution model of GPUs

ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems

Proceedings of the 26th ACM international conference on Supercomputing
An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm

Journal of Computational Physics
Productivity of GPUs under different programming paradigms

Concurrency and Computation: Practice & Experience
Energy-efficient non-minimal path on-chip interconnection network for heterogeneous systems

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Simultaneous branch and warp interweaving for sustained GPU performance

Proceedings of the 39th Annual International Symposium on Computer Architecture
A simulation framework for scheduling performance evaluation on CPU-GPU heterogeneous system

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
GPGPU implementation of growing neural gas: Application to 3D scene reconstruction

Journal of Parallel and Distributed Computing
Workload and power budget partitioning for single-chip heterogeneous processors

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
GPU optimization of convolution for large 3-d real images

ACIVS'12 Proceedings of the 14th international conference on Advanced Concepts for Intelligent Vision Systems
Spill code placement for SIMD machines

SBLP'12 Proceedings of the 16th Brazilian conference on Programming Languages
RDFS reasoning on massively parallel hardware

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Homogeneous and heterogeneous MPSoC architectures with network-on-chip connectivity for low-power and real-time multimedia signal processing

VLSI Design
Recognition of two-dimensional representation of urban environment for autonomous flying agents

Expert Systems with Applications: An International Journal
A multi-processor NoC-based architecture for real-time image/video enhancement

Journal of Real-Time Image Processing
Efficient design space exploration of GPGPU architectures

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Exploring GPU architectures to accelerate semantic comparison for intention-based search

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
An investigation of the effects of error correcting code on GPU-accelerated molecular dynamics simulations

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Aging-aware compiler-directed VLIW assignment for GPGPU architectures

Proceedings of the 50th Annual Design Automation Conference
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Divergence analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generation

Journal of Parallel and Distributed Computing
GPU code generation for ODE-based applications with phased shared-data access patterns

ACM Transactions on Architecture and Code Optimization (TACO)
GPU-based iterative transmission reconstruction in 3D ultrasound computer tomography

Journal of Parallel and Distributed Computing
A comprehensive comparison of GPU- and FPGA-based acceleration of reflection image reconstruction for 3D ultrasound computer tomography

Journal of Real-Time Image Processing
Boosting CUDA Applications with CPU---GPU Hybrid Computing

International Journal of Parallel Programming
Use of GPU computing for uncertainty quantification in computational mechanics: A case study

Scientific Programming

Quantified Score

Hi-index	0.02

Visualization

Abstract

GPU computing is at a tipping point, becoming more widely used in demanding consumer applications and high-performance computing. This article describes the rapid evolution of GPU architectures—from graphics processors to massively parallel many-core multiprocessors, recent developments in GPU computing architectures, and how the enthusiastic adoption of CPU+GPU coprocessing is accelerating parallel applications.