Trident: a scalable architecture for scalar, vector, and matrix operations

Authors:
Mostafa I. Soliman;Stanislav G. Sedukhin
Affiliations:
The University of Aizu, Aizu-Wakamatsu City Fukushima, 965-8580 Japan;The University of Aizu, Aizu-Wakamatsu City Fukushima, 965-8580 Japan
Venue:
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Year:
2002

Citing 22
Cited 1

Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Code optimizers and register organizations for vector architectures

Code optimizers and register organizations for vector architectures
Designing the TFP Microprocessor

IEEE Micro
Evaluation of design alternatives for a multiprocessor microprocessor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Initial results on the performance and cost of vector microprocessors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Exploiting a new level of DLP in multimedia applications

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Variability in the execution of multimedia applications and implications for architecture

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A Simulation Study of Decoupled Vector Architectures

The Journal of Supercomputing
How Multimedia Workloads Will Change Processor Design

Computer
Billion-Transistor Architectures

Computer
One Billion Transistors, One Uniprocessor, One Chip

Computer
A Single-Chip Multiprocessor

Computer
Guest Editors' Introduction: Early 21st Century Processors

Computer
Subword Parallelism with MAX-2

IEEE Micro
The Stanford Hydra CMP

IEEE Micro
AltiVec Extension to PowerPC Accelerates Media Processing

IEEE Micro
Implementing Streaming SIMD Extensions on the Pentium III Processor

IEEE Micro
The visual instruction set (VIS) in UltraSPARC

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Vector microprocessors

Vector microprocessors

Matrix bidiagonalization: implementation and evaluation on the Trident processor

Neural, Parallel & Scientific Computations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Within a few years it will be possible to integrate a billion transistors on a single chip. At this integration level, we propose using a high level ISA to express parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of applications are scalar, vector, and matrix, our proposed Trident processor extends the classical vector ISA with matrix operations. The Trident processor consists of a set of parallel vector pipelines (PVPs) combined with a fast in order scalar core. The PVPs can access both vector and matrix register files to perform vector, matrix, and matrix-vector operations. One key point of our design is the exploitation of up to three levels of data parallelism. Another key point is the ring register files for storing vector and matrix data. The ring structure of the register files reduces the number and size of the address decoders, the number of ports, the area overhead caused by the address bus, and the number of registers attached to bit lines, as well as providing local communication between PVPs. The scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires replication of PVPs and increasing the register file size. Scientific, engineering, multimedia, and many other applications, which are based on a mixture of scalar, vector, and matrix operations, can be speeded up on the Trident processor.