Some Complexity Results for Matrix Computations on Parallel Processors
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
X-Tree: A tree structured multi-processor computer architecture
ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Optimization and interconnection complexity for: parallel processors, single-stage networks, and decision trees
DFSP: A Data Flow Signal Processor
IEEE Transactions on Computers
A Note on the Linear Transformation Method for Systolic Array Design
IEEE Transactions on Computers
SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Hi-index | 0.01 |
In this paper, we consider the problem of restructuring or transforming algorithms to efficiently use a single-stage interconnection network. All algorithms contain some freedom in the way they are mapped to a machine. We use this freedom to show that superior interconnection efficiency can be obtained by implementing the interconnections required by the algorithm within the context of the algorithm rather than attempting to implement each request individually. The interconnection considered is the bidirectional shuffle-shift. It is shown that two algorithm transformations are useful for implementing several lower triangular and tridiagonal system algorithms on the shuffle-shift network. Of the 14 algorithms considered, 85% could be implemented on this network. The transformations developed to produce these results are described. They are general-purpose in nature and can be applied to a much larger class of algorithms.