Codevelopment of multi-level instruction set architecture and hardware for an efficient matrix processor

  • Authors:
  • Mostafa I. Soliman;Abdulmajid F. Al-Junaid

  • Affiliations:
  • Electrical Engineering Department, Faculty of Engineering, South Valley University, Aswan, Egypt;Electrical Engineering Department, Faculty of Engineering, South Valley University, Aswan, Egypt

  • Venue:
  • Neural, Parallel & Scientific Computations
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The instruction set architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. Multi-level ISA is proposed to explicitly communicate data parallelism to hardware (processor) in a compact way instead of the dynamic extraction using complex hardware or the static extraction using sophisticated compiler techniques. This paper presents the codevelopment of multi-level ISA and hardware for an efficient matrix processor called Mat-Core. Mat-Core extends a general-purpose scalar processor with a matrix unit for processing vector/matrix data. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute scalar-matrix, vector-matrix, and matrix-matrix instructions in addition to scalarvector and vector-vector instructions. Mat-Core leads to a compiler model that is efficient both in terms of performance and executable code size. On four parallel lanes Mat-Core and matrix registers of size 8×4 or 32 elements, our results show performances of about 1.6, 2.1, 4.1, and 6.4 FLOPs per clock cycle achieved on scalar-vector multiplication, SAXPY, vector-matrix multiplication, and matrix-matrix multiplication, respectively.