A direct method for optimal VLSI realization of deeply nested n-D loop problems

Authors:
B. Bala Tripura Sundari;T. R. Padmanabhan
Affiliations:
-;-
Venue:
Microprocessors & Microsystems
Year:
2013

Citing 26
Cited 0

On synthesizing systolic arrays from recurrence equations with linear dependencies

Proc. of the sixth conference on Foundations of software technology and theoretical computer science
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication

IEEE Transactions on Computers
High-level synthesis: introduction to chip and system design

High-level synthesis: introduction to chip and system design
Partitioning of processor arrays: a piecewise regular approach

Integration, the VLSI Journal - Special issue on algorithms and architectures
Achieving Full Parallelism Using Multidimensional Retiming

IEEE Transactions on Parallel and Distributed Systems
Multilevel hypergraph partitioning: applications in VLSI domain

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)
The parallel execution of DO loops

Communications of the ACM
Extension Of The Alpha Language To Recurrences On Sparse Periodic Domains

ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On exploring inter-iteration parallelism within rate-balanced multirate multidimensional DSP algorithms

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules

Journal of Parallel and Distributed Computing
Efficient control generation for mapping nested loop programs onto processor arrays

Journal of Systems Architecture: the EUROMICRO Journal
Cache-efficient numerical algorithms using graphics hardware

Parallel Computing
Application-specific Processor Architecture: Then and Now

Journal of Signal Processing Systems
High-Level Synthesis: from Algorithm to Digital Circuit

High-Level Synthesis: from Algorithm to Digital Circuit
A reindexing based approach towards mapping of DAG with affine schedules onto parallel embedded systems

Journal of Parallel and Distributed Computing
Efficient reconfigurable embedded parsers

Computer Languages, Systems and Structures
A holistic approach for tightly coupled reconfigurable parallel processors

Microprocessors & Microsystems
A coarse-grain reconfigurable architecture for multimedia applications supporting subword and floating-point calculations

Journal of Systems Architecture: the EUROMICRO Journal
Towards real time implementation of reconstructive signal processing algorithms using systolic arrays coprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Efficient retiming of large circuits

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations

Parallel Computing
A novel modular systolic array architecture for full-search block matching motion estimation

IEEE Transactions on Circuits and Systems for Video Technology
Design space exploration of deeply nested loop 2D filtering and 6 level FSBM algorithm mapped onto systolic array

VLSI Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many computationally intensive algorithms are often represented as n-dimensional (n-D) nested loop algorithms. Systolic-array-based projections and their modifications involving multidimensional vector space representations have been used to realise the optimal VLSI design of deeply nested loop problems. The approaches employed so far involve an extensive search of the feasible solution space through heuristic methods and yield near optimal solutions. This paper presents a method of identifying the optimal solution directly and through a logical procedure. The new allocation method is shown to evolve around the computational expression and the sub-space in which it lies. The array of processing elements termed as the PE array is allocated to the indentified computational sub-space which is strictly of lower dimension than the n-D problem space. The proposed new optimal allocation procedure is first explained using the 3-D matrix/matrix multiplication (MMM) problem. The effectiveness of the method for higher dimensional problem is demonstrated through the illustrative example flow of 6-D full search block motion (FSBM) algorithm. The various design possibilities of the above mapping procedure are explored analytically and the cost constraints termed the figure of merit (FoM) of the design are evolved for the various design trade-offs for MMM and 6-D FSBM problem. An experimental methodology is developed using a hyper-graph model to represent the PE allocation to a particular sub-space of the n-D problem space. The advantage of our mapping procedure is illustrated by considering two cases namely, first an allocation represented by a vertex cover that covers the nodes of the identified computational (n-x)-D sub-space, where x