An optimal routing algorithm for mesh-connected Parallel computers
Journal of the ACM (JACM)
Sorting on a mesh-connected parallel computer
Communications of the ACM
Parallel Processing with the Perfect Shuffle
IEEE Transactions on Computers
Optimal BPC Permutations on a Cube Connected SIMD Computer
IEEE Transactions on Computers
Parallel Permutations of Data: A Benes Network Control Algorithm for Frequently Used Permutations
IEEE Transactions on Computers
A Uniform Representation of Single-and Multistage Interconnection Networks Used in SIMD Machines
IEEE Transactions on Computers
Implementation of Permutation Functions in Illiac IV-Type Computers
IEEE Transactions on Computers
IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor
IEEE Transactions on Computers
A Model of SIMD Machines and a Comparison of Various Interconnection Networks
IEEE Transactions on Computers
Generalized Connection Networks for Parallel Processor Intercommunication
IEEE Transactions on Computers
Optimized mesh-connected networks for SIMD and MIMD architectures
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Parallel Sorting in Two-Dimensional VLSI Models of Computation
IEEE Transactions on Computers
Two Packet Routing Algorithms on a Mesh-Connected Computer
IEEE Transactions on Parallel and Distributed Systems
Mesh Connected Computers with Fixed and Reconfigurable Buses: Packet Routing and Sorting
IEEE Transactions on Computers
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Hi-index | 14.99 |
Performing permutations of data on SIMD computers efficiently is important for high-speed execution of parallel algorithms. In this correspondence we consider realizing permutations such as perfect shuffle, matrix transpose, bit-reversal, the class of bit-permute- complement (BPC), the class of Omega, and inverse Omega permutations on N = 2n processors with Illiac IV-type interconnection network, where each processor is connected to processors at distances of ± 1 and ± N. The minimum number of data transfer operations required for realizing any of these permutations on such a network is shown to be 2(N − 1). We provide a general three-phase strategy for realizing permutations and derive routing algorithms for performing perfect shuffle, Omega, Inverse Omega, bit reversal, and matrix-transpose permutations in 2(N − 1) steps. Our approach is quite simple, and unlike previous approaches, makes efficient use of the topology of the Illiac IV-type network to realize these permutations using the optimum number of data transfers. Our strategy is quite powerful: any permutation can be realized using this strategy in 3(N − 1) steps.