Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

Authors:
Ching-Hsien Hsu
Affiliations:
Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan
Venue:
The Journal of Supercomputing
Year:
2005

Citing 22
Cited 1

Data distributions for sparse matrix vector multiplication

Parallel Computing
Optimization of array redistribution for distributed memory multicomputers

Parallel Computing
Processor Mapping Techniques Toward Efficient Data Redistribution

IEEE Transactions on Parallel and Distributed Systems
Compiling array expressions for efficient execution on distributed-memory machines

Journal of Parallel and Distributed Computing
Optimizations for efficient array redistribution on distributed memory multicomputers

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Efficient index set generation for compiling HPF array statements on distributed-memory machines

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Parallelization techniques for sparse matrix applications

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Fast runtime block cyclic data redistribution on multiprocessors

Journal of Parallel and Distributed Computing
Scheduling Block-Cyclic Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Algorithmic Redistribution Methods for Block-Cyclic Decompositions

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for Multi-Dimensional Array Redistribution

The Journal of Supercomputing
Processor reordering algorithms toward efficient GEN_BLOCK redistribution

Proceedings of the 2001 ACM symposium on Applied computing
A Generalized Processor Mapping Technique for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers

The Journal of Supercomputing
Distribution Assignment Placement: Effective Optimization of Redistribution Costs

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Sparse Matrix Block-Cyclic Redistribution

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Multi-phase array redistribution: modeling and evaluation

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Symbolic Communication Set Generation for Irregular Parallel Applications

The Journal of Supercomputing
A Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers

The Journal of Supercomputing
An efficient algorithm for irregular redistributions in parallelizing compilers

ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications

A message combining approach for efficient array redistribution in non-all-to-all communication networks

International Journal of Computer Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modifying the data distribution over the course of a program to adapt to variations in the data access patterns may leads to significant computational benefits in many scientific applications. Therefore, dynamic realignment of data is used to enhance algorithm performance in parallel programs on distributed memory machines. This paper presents a new method aims to the efficiency of block-cyclic data realignment of sparse matrix. The main idea of the proposed technique is first todevelop closed forms for generating the Vector Index Set (VIS) of each source/destination processor. Based on the vector index set and the nonzero structure of sparse matrix, two efficient algorithms,vector2message (v2m) and message2vector (m2v) can be derived. The proposed technique uses v2m to extract nonzero elements from source compressed structures and packs them into messages in the source stage; and uses m2v to unpack each received messages and construct the destination matrix in the destination stage. A significant improvement of this approach is that a processor does not need to determine the complicated sending or receiving data sets for dynamic data redistribution. The indexing cost is reduced obviously. The second advantage of the present techniques is the achievement of optimal packing/unpacking stages consequent upon the informative VIS tables. Another contribution of our methods is the ability to handle sparse matrix redistribution under two disjoint processor grids in the source and destination phases. A theoretical model to analyze the performance of the proposed technique is also presented in this work. To evaluate the performance of our methods, we have implemented the present algorithms on an IBM SP2 parallel machine along with the Histogram method and a dense redistribution strategy. The experimental results show that our technique provides significant improvement for runtime data redistribution of sparse matrices in most test samples.