High-Performance matrix multiply on a massively multithreaded fiteng1000 processor

  • Authors:
  • Jie Liu;Lihua Chi;Chunye Gong;Han Xu;Jie Jiang;Yihui Yan;Qingfeng Hu

  • Affiliations:
  • Section 605, College of Computer Science, National University of Defense Technology, Changsha, China;Section 605, College of Computer Science, National University of Defense Technology, Changsha, China;Section 605, College of Computer Science, National University of Defense Technology, Changsha, China;Section 605, College of Computer Science, National University of Defense Technology, Changsha, China;Section 605, College of Computer Science, National University of Defense Technology, Changsha, China;Section 605, College of Computer Science, National University of Defense Technology, Changsha, China;Section 605, College of Computer Science, National University of Defense Technology, Changsha, China

  • Venue:
  • ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Matrix multiplication is an essential building block of many linear algebra operations and applications. This paper presents parallel algorithms with shared A or B matrix in the memory for the special massively multithreaded Fiteng1000 processor. We discuss the implementations of parallel matrix multiplication algorithms on the multi-core processor with many threads. To gain better performance, it is important to choose the 2D thread spatial topography, the memory layer for the placement and the sizes of the matrices. Parallel codes using C and assembly language under OpenMP parallel programming environment are designed. Performance results on Fiteng1000 processor show that the algorithms have well good parallel performance and achieve near-peak performance.