Understanding throughput-oriented architectures

Authors:
Michael Garland;David B. Kirk
Affiliations:
NVIDIA Research, Santa Clara, CA;NVIDIA Research, Santa Clara, CA
Venue:
Communications of the ACM
Year:
2010

Citing 23
Cited 19

Architecture and Applications of the Connection Machine

Computer
Exploiting heterogeneous parallelism on a multithreaded multiprocessor

ICS '92 Proceedings of the 6th international conference on Supercomputing
Interleaving: a multithreading technique targeting multiprocessors and workstations

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Vector architectures: past, present and future

ICS '98 Proceedings of the 12th international conference on Supercomputing
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Merging with parallel processors

Communications of the ACM
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The Imagine Stream Processor

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Best of Both Latency and Throughput

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Microprocessors in the era of terascale integration

Proceedings of the conference on Design, automation and test in Europe
Scalable Parallel Programming with CUDA

Queue - GPU Computing
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Parallel Computing Experiences with CUDA

IEEE Micro
Designing efficient sorting algorithms for manycore GPUs

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
CUDA by Example: An Introduction to General-Purpose GPU Programming

CUDA by Example: An Introduction to General-Purpose GPU Programming

The case for VOS: the vector operating system

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Debugging CUDA

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Spiking neural P system simulations on a high performance GPU platform

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
A spiking neural p system simulator based on CUDA

CMC'11 Proceedings of the 12th international conference on Membrane Computing
Workload balancing on heterogeneous systems: a case study of sparse grid interpolation

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
On the communication complexity of 3D FFTs and its implications for Exascale

Proceedings of the 26th ACM international conference on Supercomputing
GPU accelerated computation of the longest common subsequence

Facing the Multicore-Challenge II
On parallel software verification using boolean equation systems

SPIN'12 Proceedings of the 19th international conference on Model Checking Software
Designing a unified programming model for heterogeneous machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
U2STRA: high-performance data management of ubiquitous urban sensing trajectories on GPGPUs

Proceedings of the 2012 ACM workshop on City data management workshop
Using vector interfaces to deliver millions of IOPS from a networked key-value storage server

Proceedings of the Third ACM Symposium on Cloud Computing
Accelerated parallel genetic programming tree evaluation with OpenCL

Journal of Parallel and Distributed Computing
Spill code placement for SIMD machines

SBLP'12 Proceedings of the 16th Brazilian conference on Programming Languages
Enhancing GPU parallelism in nature-inspired algorithms

The Journal of Supercomputing
Evaluating the acceleration of typical scientific problems on the GPU

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Divergence analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Towards adaptive learning with improved convergence of deep belief networks on graphics processing units

Pattern Recognition
Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)

Computers & Geosciences
Boosting CUDA Applications with CPU---GPU Hybrid Computing

International Journal of Parallel Programming

Quantified Score

Hi-index	48.22

Visualization

Abstract

For workloads with abundant parallelism, GPUs deliver higher peak computational throughput than latency-oriented CPUs.