The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
Wavefront Array Processor: Language, Architecture, and Applications
IEEE Transactions on Computers
Pin Limitations and Partitioning of VLSI Interconnection Networks
IEEE Transactions on Computers
On the Analysis and Synthesis of VLSI Algorithms
IEEE Transactions on Computers
Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems
IEEE Transactions on Computers
Matching algorithms to array processors
ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
Orderings and partition of PDE computations for a fixed-size VLSI architecture
ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
Pipelined data parallel algorithms—concept and modeling
ICS '88 Proceedings of the 2nd international conference on Supercomputing
The symbolic hyperplane transformation for recursively defined arrays
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
On high-speed computing with a programmable linear array
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Systematic hardware adaptation of systolic algorithms
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
Optimal Dynamic Remapping of Data Parallel Computations
IEEE Transactions on Computers
On the Design of a Unidirectional Systolic Array for Key Enumeration
IEEE Transactions on Computers
A systolic array processor for biological information signal processing
ICS '91 Proceedings of the 5th international conference on Supercomputing
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies
IEEE Transactions on Computers
Independent Partitioning of Algorithms with Uniform Dependencies
IEEE Transactions on Computers
Synthesis aspects in the design of efficient processor arrays from affine recurrence equations
Journal of Symbolic Computation - Special issue on automatic programming
Partitioning the statement per iteration space using non-singular matrices
ICS '93 Proceedings of the 7th international conference on Supercomputing
Sequencer-based data path synthesis of regular iterative algorithms
DAC '94 Proceedings of the 31st annual Design Automation Conference
Some New Designs of 2-D Array for Matrix Multiplication and Transitive Closure
IEEE Transactions on Parallel and Distributed Systems
Compiler technology for parallel scientific computation
Scientific Programming
Integral knapsack problems: parallel algorithms and their implementations on distributed systems
ICS '95 Proceedings of the 9th international conference on Supercomputing
A Modular Systolic Linearization of the Warshall-Floyd Algorithm
IEEE Transactions on Parallel and Distributed Systems
Automatic optimization of communication in compiling out-of-core stencil codes
ICS '96 Proceedings of the 10th international conference on Supercomputing
Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
Optimal Data Scheduling for Uniform Multidimensional Applications
IEEE Transactions on Computers
Optimization of the background memory utilization by partitioning
ISSS '97 Proceedings of the 10th international symposium on System synthesis
Designing a Scalable Processor Array for Recurrent Computations
IEEE Transactions on Parallel and Distributed Systems
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP
Journal of VLSI Signal Processing Systems
Improved Compressions of Cube-Connected Cycles Networks
IEEE Transactions on Parallel and Distributed Systems
Alpha du centaur: a prototype environment for the design of parallel regular alorithms
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Comparison of several techniques for generating systolic arrays
CSC '88 Proceedings of the 1988 ACM sixteenth annual conference on Computer science
Some new systolic designs for two-dimensional convolution
CSC '88 Proceedings of the 1988 ACM sixteenth annual conference on Computer science
A toroidal systolic array for Warshall's algorithm
CSC '91 Proceedings of the 19th annual conference on Computer Science
Hybrid static-dynamic communication scheduling for parallel systems
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays
Journal of VLSI Signal Processing Systems
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Systolic Opportunities for Multidimensional Data Streams
IEEE Transactions on Parallel and Distributed Systems
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Design of Processor Arrays for Reconfigurable Architectures
The Journal of Supercomputing
On the Relationship Between Two Systolic Array Design Methodologies
IEEE Transactions on Computers
A Family of Efficient Regular Arrays for Algebraic Path Problem
IEEE Transactions on Computers
Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules
IEEE Transactions on Computers
On Mapping Systolic Algorithms onto the Hypercube
IEEE Transactions on Parallel and Distributed Systems
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays
IEEE Transactions on Parallel and Distributed Systems
Uniform Approach for Solving some Classical Problems on a Linear Array
IEEE Transactions on Parallel and Distributed Systems
Synthesizing Nested Loop Algorithms Using Nonlinear Transformation Method
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
HARP: An Open Architecture for Parallel Matrix and Signal Processing
IEEE Transactions on Parallel and Distributed Systems
Loop Transformation Using Nonunimodular Matrices
IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays
IEEE Transactions on Parallel and Distributed Systems
Knapsack on VLSI: from Algorithm to Optimal Circuit
IEEE Transactions on Parallel and Distributed Systems
Exact Partitioning of Affine Dependence Algorithms
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A Skeleton for Parallel Dynamic Programming
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Combining Serialisation and Reconfiguration for FPGA Designs
FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
An Emulator for Exploring RaPiD Configurable Computing Architectures
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Loop Tiling for Reconfigurable Accelerators
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Towards the automatic optimal mapping of pipeline algorithms
Parallel Computing
Exact partitioning of affine dependence algorithms
Embedded processor design challenges
Hexagonal systolic arrays for matrix multiplication
Highly parallel computaions
Three-dimensional orthogonal tile sizing problem: mathematical programming approach
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Scheduling in Co-Partitioned Array Architectures
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Architecture Design of Reconfigurable Pipelined Datapaths
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Combining Serialization and Reconfiguration for Convolver Designs
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Evaluation of Loop Grouping Methods Based on Orthogonal Projection Spaces
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm
IEEE Transactions on Computers
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture
IEEE Transactions on Computers
Mapping dynamic programming onto modular linear systolic arrays
Distributed Computing
Computing transitive closure on systolic arrays of fixed size
Distributed Computing
Towards systolizing compilation
Distributed Computing
Design and Evaluation of Dynamic Key Message Algorithms for Cluster Computing
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Automatic mapping of nested loops to FPGAS
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Cronus: A platform for parallel code generation based on computational geometry methods
Journal of Systems and Software
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A new modular exponentiation architecture for efficient design of RSA cryptosystem
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A new algorithm for high-speed modular multiplication design
IEEE Transactions on Circuits and Systems Part I: Regular Papers
Designing of processor-time optimal systolic arrays for band matrix-vector multiplication
Computers & Mathematics with Applications
Hardware Acceleration of HMMER on FPGAs
Journal of Signal Processing Systems
Short Communication: Synthesis of algorithms on processor arrays
Parallel Computing
Efficient fixed-size systolic arrays for the modular multiplication
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Geometric scheduling of 2-D UET-UCT uniform dependence loops
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High performance phylogenetic analysis with maximum parsimony on reconfigurable hardware
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Determining objective functions in systolic array designs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 15.02 |
A technique for partitioning and mapping algorithms into VLSI systolic arrays is presented in this paper. Algorithm partitioning is essential when the size of a computational problem is larger than the size of the VLSI array intended for that problem. Computational models are introduced for systolic arrays and iterative algorithms. First, we discuss the mapping of algorithms into arbitrarily large size VLSI arrays. This mapping is based on the idea of algorithm transformations. Then, we present an approach to algorithm partitioning which is also based on algorithm transformations. Our approach to the partitioning problem is to divide the algorithm index set into bands and to map these bands into the processor space. The partitioning and mapping technique developed throughout the paper is summarized as a six step procedure. A computer program implementing this procedure was developed and some results obtained with this program are presented.