The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Cache Refill/Access Decoupling for Vector Machines
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Closing the POWER Gap between ASIC & Custom: Tools and Techniques for Low Power Design
Closing the POWER Gap between ASIC & Custom: Tools and Techniques for Low Power Design
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Compiling for vector-thread architectures
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Vector-thread architecture and implementation
Vector-thread architecture and implementation
OpenCL memory infrastructure for FPGAs (abstract only)
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
The Scale vector-thread processor is a complexity-effective solution for embedded computing which flexibly supports both vector and highly multithreaded processing. The 7.1-million transistor chip has 16 decoupled execution clusters, vector load and store units, and a nonblocking 32KB cache. An automated and iterative design and verification flow enabled a performance-, power-, and area-efficient implementation with two person-years of development effort. Scale has a core area of 16.6 mm2 in 180 nm technology, and it consumes 400 mW--1.1 W while running at 260 MHz.