Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Regular interactive algorithms and their implementations on processor arrays
Regular interactive algorithms and their implementations on processor arrays
Theory of linear and integer programming
Theory of linear and integer programming
VLSI array processors
Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems
IEEE Transactions on Computers
A design methodology for synthesizing parallel algorithms and architectures
Journal of Parallel and Distributed Computing
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
Scheduling, partitioning and mapping of uniform dependence algorithms on processor arrays
Scheduling, partitioning and mapping of uniform dependence algorithms on processor arrays
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies
IEEE Transactions on Computers
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Parallel Computers Two: Architecture, Programming and Algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays
IEEE Transactions on Parallel and Distributed Systems
Conflict-Free Scheduling of Nested Loop Algorithms on Lower Dimensional Processor Arrays
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Bit-Serial Parallel Processing Systems
IEEE Transactions on Computers
ICS '94 Proceedings of the 8th international conference on Supercomputing
Optimal Synthesis of Algorithm-Specific Lower-Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
A Modular Systolic Linearization of the Warshall-Floyd Algorithm
IEEE Transactions on Parallel and Distributed Systems
Computing Programs Containing Band Linear Recurrences on Vector Supercomputers
IEEE Transactions on Parallel and Distributed Systems
Designing a Scalable Processor Array for Recurrent Computations
IEEE Transactions on Parallel and Distributed Systems
An Approach to Designing Modular Extensible Linear Arrays for Regular Algorithms
IEEE Transactions on Computers
Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Method for Parallelizing Algorithms by Vector Scheduling Functions
Programming and Computing Software
Processor Array Synthesis from Shift-Variant Deep Nested Do Loops
The Journal of Supercomputing
On Loop Transformations for Generalized Cycle Shrinking
IEEE Transactions on Parallel and Distributed Systems
Loop Transformation Using Nonunimodular Matrices
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Bit-level two's complement matrix multiplication
Integration, the VLSI Journal
Two-Dimensional Scheduling of Algorithms with Uniform Dependencies
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
A Comparison of FPGA Implementations of Bit-Level and Word-Level Matrix Multipliers
FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Hexagonal systolic arrays for matrix multiplication
Highly parallel computaions
Mapping rectangular mesh algorithms onto asymptotically space-optimal arrays
Journal of Parallel and Distributed Computing
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules
Journal of Parallel and Distributed Computing
An architecture for the estimation of higher order cumulants
ICASSP '93 Proceedings of the Acoustics, Speech, and Signal Processing, 1993. ICASSP-93 Vol 4., 1993 IEEE International Conference on - Volume 04
The Journal of Supercomputing
Journal of Parallel and Distributed Computing
On minimizing register usage of linearly scheduled algorithms with uniform dependencies
Computer Languages, Systems and Structures
Computing transitive closure problem on linear systolic array
NAA'04 Proceedings of the Third international conference on Numerical Analysis and its Applications
Hi-index | 0.01 |
Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms, or algorithms with n nested loops, are mapped into (n-1)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k-1)-dimensional arrays where k is greather than n. A computational conflict occurs if two or more computations of an algorithm are mapped into the same execution time. Based on the Hermite normal form of the mapping matrix, necessary and sufficient conditions are derived to identify mapping without computational conflicts. These conditions are used to find time mappings of n-dimensional algorithms into (k-1)-dimensional arrays, k is greather than n, without computational conflicts. For some applications, the mapping is time-optimal.