Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor

  • Authors:
  • Xuejun Yang;Jing Du;Xiaobo Yan;Yu Deng

  • Affiliations:
  • PDL, School of Computer, National University of Defense Technology, Changsha, China 410073;PDL, School of Computer, National University of Defense Technology, Changsha, China 410073;PDL, School of Computer, National University of Defense Technology, Changsha, China 410073;PDL, School of Computer, National University of Defense Technology, Changsha, China 410073

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

FT64 is the first 64-bit stream processor designed for scientific computing. It is critical to exploit optimizing streamization approaches for scientific applications on FT64 due to the inefficiency of direct streamization approach. In this paper, we propose a novel matrix-based streamization approach for improving locality and parallelism of scientific applications on FT64. First, a Data&Computation Matrix is built to abstract the relationship between loops and arrays of the original programs, and it is helpful for formulating the streamization problem. Second, three key techniques for optimizing streamization approach are proposed based on the transformations of the matrix, i.e., coarse-grained program transformations, fine-grained program transformations, and stream organization optimizations. Finally, we apply our approach to ten typical scientific application kernels on FT64. The experimental results show that the matrix-based streamization approach achieves an average speedup of 2.76 over the direct streamization approach, and performs equally to or better than the corresponding Fortran programs on Itanium 2 except CG. It is certain that the matrix-based streamization approach is a promising and practical solution to efficiently exploit the tremendous potential of FT64.