The Vector-Thread Architecture

Authors:
Ronny Krashinsky;Christopher Batten;Mark Hampton;Steve Gerding;Brian Pharris;Jared Casper;Krste Asanovic
Affiliations:
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
Venue:
Proceedings of the 31st annual international symposium on Computer architecture
Year:
2004

Citing 13
Cited 30

Dynamic Instruction Scheduling and the Astronautics ZS-1

Computer
Multi-threaded vectorization

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Spert-II: A Vector Microprocessor System

Computer - Special issue: neural computing: companion issue to Spring 1996 IEEE Computational Science & Engineering
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines

ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Baring It All to Software: Raw Machines

Computer
Overcoming the limitations of conventional vector processors

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Scalable vector media-processors for embedded systems

Scalable vector media-processors for embedded systems

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Vector-Thread Architecture

IEEE Micro
The design and implementation of a low-latency on-chip network

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Tile size selection for low-power tile-based architectures

Proceedings of the 3rd conference on Computing frontiers
Implementing virtual memory in a vector processor with software restart markers

Proceedings of the 20th annual international conference on Supercomputing
The potential energy efficiency of vector acceleration

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ALP: Efficient support for all levels of parallelism for complex media applications

ACM Transactions on Architecture and Code Optimization (TACO)
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors

Proceedings of the 21st annual international conference on Supercomputing
An embedded coherent-multithreading multimedia processor and its programming model

Proceedings of the 44th annual Design Automation Conference
Compiling for vector-thread architectures

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Implementing the scale vector-thread processor

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Using Application Bisection Bandwidth to Guide Tile Size Selection for the Synchroscalar Tile-Based Architecture

Transactions on High-Performance Embedded Architectures and Compilers I
Vector Processing as a Soft Processor Accelerator

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Customized kernel execution on reconfigurable hardware for embedded applications

Microprocessors & Microsystems
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

ACM Transactions on Architecture and Code Optimization (TACO)
AnySP: anytime anywhere anyway signal processing

Proceedings of the 36th annual international symposium on Computer architecture
A VLIW vector media coprocessor with cascaded SIMD ALUs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reconfiguration support for vector operations

International Journal of High Performance Systems Architecture
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
Understanding throughput-oriented architectures

Communications of the ACM
An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

Journal of Parallel and Distributed Computing
A pattern for efficient parallel computation on multicore processors with scalar operand networks

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Proceedings of the 38th annual international symposium on Computer architecture
SIMD defragmenter: efficient ILP realization on data-parallel architectures

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Improving GPU performance via large warps and two-level warp scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring memory consistency for massively-threaded throughput-oriented processors

Proceedings of the 40th Annual International Symposium on Computer Architecture
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators

ACM Transactions on Computer Systems (TOCS)
Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

The vector-thread (VT) architectural paradigm unifies the vectorand multithreaded compute models. The VT abstraction providesthe programmer with a control processor and a vector of virtualprocessors (VPs). The control processor can use vector-fetch commandsto broadcast instructions to all the VPs or each VP can usethread-fetches to direct its own control flow. A seamless intermixingof the vector and threaded control mechanisms allows a VT architectureto flexibly and compactly encode application parallelismand locality, and a VT machine exploits these to improve performanceand efficiency. We present SCALE, an instantiation of theVT architecture designed for low-power and high-performance embeddedsystems. We evaluate the SCALE prototype design usingdetailed simulation of a broad range of embedded applications andshow that its performance is competitive with larger and more complexprocessors.