Exploiting heterogeneous parallelism on a multithreaded multiprocessor
ICS '92 Proceedings of the 6th international conference on Supercomputing
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Vector architectures: past, present and future
ICS '98 Proceedings of the 12th international conference on Supercomputing
Communications of the ACM - Special issue on computer architecture
Merging with parallel processors
Communications of the ACM
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Microprocessors in the era of terascale integration
Proceedings of the conference on Design, automation and test in Europe
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Parallel Computing Experiences with CUDA
IEEE Micro
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
CUDA by Example: An Introduction to General-Purpose GPU Programming
CUDA by Example: An Introduction to General-Purpose GPU Programming
The case for VOS: the vector operating system
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Spiking neural P system simulations on a high performance GPU platform
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
A spiking neural p system simulator based on CUDA
CMC'11 Proceedings of the 12th international conference on Membrane Computing
Workload balancing on heterogeneous systems: a case study of sparse grid interpolation
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
On the communication complexity of 3D FFTs and its implications for Exascale
Proceedings of the 26th ACM international conference on Supercomputing
GPU accelerated computation of the longest common subsequence
Facing the Multicore-Challenge II
On parallel software verification using boolean equation systems
SPIN'12 Proceedings of the 19th international conference on Model Checking Software
Designing a unified programming model for heterogeneous machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
U2STRA: high-performance data management of ubiquitous urban sensing trajectories on GPGPUs
Proceedings of the 2012 ACM workshop on City data management workshop
Using vector interfaces to deliver millions of IOPS from a networked key-value storage server
Proceedings of the Third ACM Symposium on Cloud Computing
Accelerated parallel genetic programming tree evaluation with OpenCL
Journal of Parallel and Distributed Computing
Spill code placement for SIMD machines
SBLP'12 Proceedings of the 16th Brazilian conference on Programming Languages
Enhancing GPU parallelism in nature-inspired algorithms
The Journal of Supercomputing
Evaluating the acceleration of typical scientific problems on the GPU
Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
ACM Transactions on Programming Languages and Systems (TOPLAS)
Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)
Computers & Geosciences
Boosting CUDA Applications with CPU---GPU Hybrid Computing
International Journal of Parallel Programming
Hi-index | 48.22 |
For workloads with abundant parallelism, GPUs deliver higher peak computational throughput than latency-oriented CPUs.