Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Synthesis of an Optimal Family of Matrix Multiplication Algorithms on Linear Arrays
IEEE Transactions on Computers
Regular interactive algorithms and their implementations on processor arrays
Regular interactive algorithms and their implementations on processor arrays
Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems
IEEE Transactions on Computers
The warp computer: Architecture, implementation, and performance
IEEE Transactions on Computers
On high-speed computing with a programmable linear array
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The Area-Time Complexity of Binary Multiplication
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Introduction to VLSI Systems
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
The Generation of a Class of Multipliers: Synthesizing Highly Parallel Algorithms in VLSI
IEEE Transactions on Computers
On Matrix Multiplication Using Array Processors
Proceedings of the 12th Colloquium on Automata, Languages and Programming
STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
The mechanically certified derivation of concurrency and its application to systolic design
The mechanically certified derivation of concurrency and its application to systolic design
Computational Aspects of VLSI
On high-speed computing with a programmable linear array
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Compiler technology for parallel scientific computation
Scientific Programming
Optimal Synthesis of Algorithm-Specific Lower-Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
A Modular Systolic Linearization of the Warshall-Floyd Algorithm
IEEE Transactions on Parallel and Distributed Systems
Journal of VLSI Signal Processing Systems
An Approach to Designing Modular Extensible Linear Arrays for Regular Algorithms
IEEE Transactions on Computers
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays
Journal of VLSI Signal Processing Systems
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Processor Array Synthesis from Shift-Variant Deep Nested Do Loops
The Journal of Supercomputing
Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules
IEEE Transactions on Computers
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays
IEEE Transactions on Parallel and Distributed Systems
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
Uniform Approach for Solving some Classical Problems on a Linear Array
IEEE Transactions on Parallel and Distributed Systems
Synthesizing Nested Loop Algorithms Using Nonlinear Transformation Method
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
A Processor-Time-Minimal Systolic Array for Cubical Mesh Algorithms
IEEE Transactions on Parallel and Distributed Systems
A Processor-Time-Minimal Systolic Array for Transitive Closure
IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
An Emulator for Exploring RaPiD Configurable Computing Architectures
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Architecture Design of Reconfigurable Pipelined Datapaths
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture
IEEE Transactions on Computers
A VLSI array processing oriented fast fourier transform algorithm and hardware implementation
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Mapping dynamic programming onto modular linear systolic arrays
Distributed Computing
Computing transitive closure on systolic arrays of fixed size
Distributed Computing
Towards systolizing compilation
Distributed Computing
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules
Journal of Parallel and Distributed Computing
Efficient implementation of nested-loop multimedia algorithms
EURASIP Journal on Applied Signal Processing
A direct method for optimal VLSI realization of deeply nested n-D loop problems
Microprocessors & Microsystems
Hi-index | 14.99 |
The mapping of algorithms structured as depth-p nested FOR loops into special-purpose systolic VLSI linear arrays is addressed. The mappings are done by using linear functions to transform the original sequential algorithms into a form suitable for parallel execution on linear arrays. A feasible mapping is derived by identifying formal criteria to be satisfied by both the original sequential algorithm and the proposed transformation function. The methodology is illustrated by synthesizing algorithms for matrix multiplication and a version of the Warshall-Floyd transitive closure algorithm.