Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

  • Authors:
  • Ching-Hsien Hsu

  • Affiliations:
  • Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modifying the data distribution over the course of a program to adapt to variations in the data access patterns may leads to significant computational benefits in many scientific applications. Therefore, dynamic realignment of data is used to enhance algorithm performance in parallel programs on distributed memory machines. This paper presents a new method aims to the efficiency of block-cyclic data realignment of sparse matrix. The main idea of the proposed technique is first todevelop closed forms for generating the Vector Index Set (VIS) of each source/destination processor. Based on the vector index set and the nonzero structure of sparse matrix, two efficient algorithms,vector2message (v2m) and message2vector (m2v) can be derived. The proposed technique uses v2m to extract nonzero elements from source compressed structures and packs them into messages in the source stage; and uses m2v to unpack each received messages and construct the destination matrix in the destination stage. A significant improvement of this approach is that a processor does not need to determine the complicated sending or receiving data sets for dynamic data redistribution. The indexing cost is reduced obviously. The second advantage of the present techniques is the achievement of optimal packing/unpacking stages consequent upon the informative VIS tables. Another contribution of our methods is the ability to handle sparse matrix redistribution under two disjoint processor grids in the source and destination phases. A theoretical model to analyze the performance of the proposed technique is also presented in this work. To evaluate the performance of our methods, we have implemented the present algorithms on an IBM SP2 parallel machine along with the Histogram method and a dense redistribution strategy. The experimental results show that our technique provides significant improvement for runtime data redistribution of sparse matrices in most test samples.