Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems

Authors:
Thomas H. Cormen;Thomas Sundquist;Leonard F. Wisniewski
Affiliations:
-;-;-
Venue:
SIAM Journal on Computing
Year:
1999

Citing 0
Cited 15

Multiprocessor out-of-core FFTs with distributed memory and parallel disks (extended abstract)

Proceedings of the fifth workshop on I/O in parallel and distributed systems
External memory algorithms

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Multidimensional, multiprocessor, out-of-core FFTs with distributed memory and parallel disks (extended abstract)

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Towards a theory of cache-efficient algorithms

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Towards a theory of cache-efficient algorithms

Journal of the ACM (JACM)
An Efficient Algorithm for Out-of-Core Matrix Transposition

IEEE Transactions on Computers
External Memory Algorithms

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
External memory algorithms

Handbook of massive data sets
Building on a Framework: Using FG for More Flexibility and Improved Performance in Parallel Programs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimal sparse matrix dense vector multiplication in the I/O-model

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Efficient parallel out-of-core matrix transposition

International Journal of High Performance Computing and Networking
Combating I-O bottleneck using prefetching: model, algorithms, and ramifications

The Journal of Supercomputing
Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
Algorithmic ramifications of prefetching in memory hierarchy

HiPC'06 Proceedings of the 13th international conference on High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bit-matrix-multiply/complement (BMMC) permutations on the Parallel Disk Model proposed by Vitter and Shriver. A BMMC permutation maps a source index to a target index by an affine transformation over GF(2), where the source and target indices are treated as bit vectors. The class of BMMC permutations includes many common permutations, such as matrix transposition (when dimensions are powers of 2), bit-reversal permutations, vector-reversal permutations, hypercube permutations, matrix reblocking, Gray-code permutations, and inverse Gray-code permutations. The upper bound improves upon the asymptotic bound in the previous best known BMMC algorithm and upon the constant factor in the previous best known bit-permute/complement (BPC) permutation algorithm. The algorithm achieving the upper bound uses basic linear-algebra techniques to factor the characteristic matrix for the BMMC permutation into a product of factors, each of which characterizes a permutation that can be performed in one pass over the data.The factoring uses new subclasses of BMMC permutations: memoryload-dispersal (MLD) permutations and their inverses. These subclasses extend the catalog of one-pass permutations.Although many BMMC permutations of practical interest fall into subclasses that might be explicitly invoked within the source code, this paper shows how to quickly detect whether a given vector of target addresses specifies a BMMC permutation. Thus, one can determine efficiently at run time whether a permutation to be performed is BMMC and then avoid the general-permutation algorithm and save parallel I/Os by using the BMMC permutation algorithm herein.