An Adaptation of the Fast Fourier Transform for Parallel Processing
Journal of the ACM (JACM)
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
Journal of the ACM (JACM)
Memory-processor connection networks
Memory-processor connection networks
Fast Computational Algorithms for Bit Reversal
IEEE Transactions on Computers
Parallel Processing with the Perfect Shuffle
IEEE Transactions on Computers
Interconnections Between Processors and Memory Modules Using the Shuffle-Exchange Network
IEEE Transactions on Computers
IEEE Transactions on Computers
ILLIAC IV Software and Application Programming
IEEE Transactions on Computers
Interconnections for Parallel Memories to Unscramble p-Ordered Vectors
IEEE Transactions on Computers
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Dynamic Memories with Enhanced Data Access
IEEE Transactions on Computers
Permutations on Illiac IV-Type Networks
IEEE Transactions on Computers
The Prime Memory System for Array Access
IEEE Transactions on Computers
A Self-Routing Benes Network and Parallel Permutation Algorithms
IEEE Transactions on Computers
Parallel Permutations of Data: A Benes Network Control Algorithm for Frequently Used Permutations
IEEE Transactions on Computers
The Theory Underlying the Partitioning of Permutation Networks
IEEE Transactions on Computers
The Reverse-Exchange Interconnection Network
IEEE Transactions on Computers
Generalized Connection Networks for Parallel Processor Intercommunication
IEEE Transactions on Computers
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
Shuffling with the Illiac and PM2I SIMD Networks
IEEE Transactions on Computers
A Classification of Cube-Connected Networks with a Simple Control Scheme
IEEE Transactions on Computers
Hi-index | 15.01 |
Much research has recently been done on processor interconnection schemes for parallel computers. These interconnection schemes allow certain permutations to be performed in less than linear time, typically 0(log N), 0(log2N), or 0(vN) for a vector of N elements and N processors. In this paper we show that many permutations can also be performed in less than linear time on a machine wit