Radix sort for vector multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Spert-II: A Vector Microprocessor System
Computer - Special issue: neural computing: companion issue to Spring 1996 IEEE Computational Science & Engineering
Vector instruction set support for conditional operations
Proceedings of the 27th annual international symposium on Computer architecture
Communications of the ACM - Special issue on computer architecture
VIS Speeds New Media Processing
IEEE Micro
Subword Parallelism with MAX-2
IEEE Micro
Decoupled vector architectures
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Vector microprocessors
Universal Mechanisms for Data-Parallel Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Cache Refill/Access Decoupling for Vector Machines
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
RPU: a programmable ray processing unit for realtime ray tracing
ACM SIGGRAPH 2005 Papers
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Compiling for vector-thread architectures
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Implementing the scale vector-thread processor
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Vector-thread architecture and implementation
Vector-thread architecture and implementation
A functional description of the Lincoln TX-2 computer
IRE-AIEE-ACM '57 (Western) Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability
Intel threading building blocks
Intel threading building blocks
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Tradeoffs in designing accelerator architectures for visual computing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
ACM Transactions on Architecture and Code Optimization (TACO)
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
A Task-Centric Memory Model for Scalable Accelerator Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Performance evaluation of NEC SX-9 using real science and engineering applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The IBM System/370 vector architecture
IBM Systems Journal
Simplified vector-thread architectures for flexible and efficient data-parallel accelerators
Simplified vector-thread architectures for flexible and efficient data-parallel accelerators
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
Hi-index | 0.00 |
We present a taxonomy and modular implementation approach for data-parallel accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) architectural design patterns. We introduce Maven, a new VT microarchitecture based on the traditional vector-SIMD microarchitecture, that is considerably simpler to implement and easier to program than previous VT designs. Using an extensive design-space exploration of full VLSI implementations of many accelerator design points, we evaluate the varying tradeoffs between programmability and implementation efficiency among the MIMD, vector-SIMD, and VT patterns on a workload of compiled microbenchmarks and application kernels. We find the vector cores provide greater efficiency than the MIMD cores, even on fairly irregular kernels. Our results suggest that the Maven VT microarchitecture is superior to the traditional vector-SIMD architecture, providing both greater efficiency and easier programmability.