Mat-core: a decoupled matrix core extension for general-purpose processors

  • Authors:
  • Mostafa I. Soliman

  • Affiliations:
  • Computer & System Section, Electrical Engineering Department, Aswan Faculty of Engineering, South Valley University, Aswan, Egypt

  • Venue:
  • Neural, Parallel & Scientific Computations
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes new processor architecture to exploit the increasingly number of transistors per integrated circuit and improve the performance of many applications on general-purpose processors. The proposed processor (called Mat-Core) is based on the use of multi-level ISA to explicitly communicate data parallelism to processor in a compact way instead of the dynamic extraction using complex hardware or the static extraction using sophisticated compiler techniques. Scalar-scalar (level-O), scalar-vector (level-l), vector-vector (level-l), vector-matrix (level-2), and matrix-matrix (level-3) instruction sets are used as a multi-level interface between hardware and software. Mat-Core extends a general-purpose scalar processor (for executing scalar instructions) with a matrix unit (for executing vector/matrix instructions). To tolerate the memory latency, the extended matrix unit is decoupled into two components: address generation and data computation. The data computation unit is organized in parallel lanes; each lane contains a pipeline of each functional unit and a slice of the matrix register file. On parallel lanes, the Mat-Core processor can effectively process not only vector but also matrix data. This paper explains the execution of vector/matrix instructions on the parallel lanes of Mat-Core. Moreover, the performances of element-wise vector-vector addition, vector-matrix multiplication, and matrix-matrix multiplication are estimated on the decoupled Mat-Core processor. The increasingly budget of transistors can be exploiting to scale the Mat-core processor by providing more cores in a physical package. On a Multi-Mat-Core processor, performance would be improved by parallel processing threads of codes using multi-threading techniques.