On synthesizing systolic arrays from recurrence equations with linear dependencies
Proc. of the sixth conference on Foundations of software technology and theoretical computer science
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication
IEEE Transactions on Computers
High-level synthesis: introduction to chip and system design
High-level synthesis: introduction to chip and system design
Partitioning of processor arrays: a piecewise regular approach
Integration, the VLSI Journal - Special issue on algorithms and architectures
Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
Multilevel hypergraph partitioning: applications in VLSI domain
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Extension Of The Alpha Language To Recurrences On Sparse Periodic Domains
ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules
Journal of Parallel and Distributed Computing
Efficient control generation for mapping nested loop programs onto processor arrays
Journal of Systems Architecture: the EUROMICRO Journal
Cache-efficient numerical algorithms using graphics hardware
Parallel Computing
Application-specific Processor Architecture: Then and Now
Journal of Signal Processing Systems
High-Level Synthesis: from Algorithm to Digital Circuit
High-Level Synthesis: from Algorithm to Digital Circuit
Journal of Parallel and Distributed Computing
Efficient reconfigurable embedded parsers
Computer Languages, Systems and Structures
A holistic approach for tightly coupled reconfigurable parallel processors
Microprocessors & Microsystems
Journal of Systems Architecture: the EUROMICRO Journal
Journal of Systems Architecture: the EUROMICRO Journal
Efficient retiming of large circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A novel modular systolic array architecture for full-search block matching motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Many computationally intensive algorithms are often represented as n-dimensional (n-D) nested loop algorithms. Systolic-array-based projections and their modifications involving multidimensional vector space representations have been used to realise the optimal VLSI design of deeply nested loop problems. The approaches employed so far involve an extensive search of the feasible solution space through heuristic methods and yield near optimal solutions. This paper presents a method of identifying the optimal solution directly and through a logical procedure. The new allocation method is shown to evolve around the computational expression and the sub-space in which it lies. The array of processing elements termed as the PE array is allocated to the indentified computational sub-space which is strictly of lower dimension than the n-D problem space. The proposed new optimal allocation procedure is first explained using the 3-D matrix/matrix multiplication (MMM) problem. The effectiveness of the method for higher dimensional problem is demonstrated through the illustrative example flow of 6-D full search block motion (FSBM) algorithm. The various design possibilities of the above mapping procedure are explored analytically and the cost constraints termed the figure of merit (FoM) of the design are evolved for the various design trade-offs for MMM and 6-D FSBM problem. An experimental methodology is developed using a hyper-graph model to represent the PE allocation to a particular sub-space of the n-D problem space. The advantage of our mapping procedure is illustrated by considering two cases namely, first an allocation represented by a vertex cover that covers the nodes of the identified computational (n-x)-D sub-space, where x