Trident: a scalable architecture for scalar, vector, and matrix operations

  • Authors:
  • Mostafa I. Soliman;Stanislav G. Sedukhin

  • Affiliations:
  • The University of Aizu, Aizu-Wakamatsu City Fukushima, 965-8580 Japan;The University of Aizu, Aizu-Wakamatsu City Fukushima, 965-8580 Japan

  • Venue:
  • CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Within a few years it will be possible to integrate a billion transistors on a single chip. At this integration level, we propose using a high level ISA to express parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of applications are scalar, vector, and matrix, our proposed Trident processor extends the classical vector ISA with matrix operations. The Trident processor consists of a set of parallel vector pipelines (PVPs) combined with a fast in order scalar core. The PVPs can access both vector and matrix register files to perform vector, matrix, and matrix-vector operations. One key point of our design is the exploitation of up to three levels of data parallelism. Another key point is the ring register files for storing vector and matrix data. The ring structure of the register files reduces the number and size of the address decoders, the number of ports, the area overhead caused by the address bus, and the number of registers attached to bit lines, as well as providing local communication between PVPs. The scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires replication of PVPs and increasing the register file size. Scientific, engineering, multimedia, and many other applications, which are based on a mixture of scalar, vector, and matrix operations, can be speeded up on the Trident processor.