The Organization of Computations for Uniform Recurrence Equations

Authors:
Richard M. Karp;Raymond E. Miller;Shmuel Winograd
Affiliations:
IBM Watson Research Center, Yorktown Heights, New York;IBM Watson Research Center, Yorktown Heights, New York;IBM Watson Research Center, Yorktown Heights, New York
Venue:
Journal of the ACM (JACM)
Year:
1967

Citing 0
Cited 129

Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays

IEEE Transactions on Computers
Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems

IEEE Transactions on Computers
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
Strongly polynomial-time and NC algorithms for detecting cycles in dynamic graphs

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A Note on the Linear Transformation Method for Systolic Array Design

IEEE Transactions on Computers
Matrix Computations on Systolic-Type Meshes: An Introduction to the Multimesh Graph Method

Computer
Preconditioning index set transformations for time-optimal affine scheduling

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Study of parallelism in regular iterative algorithms

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Program optimization and parallelization using idioms

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Uniform techniques for loop optimization

ICS '91 Proceedings of the 5th international conference on Supercomputing
A unified framework for systematic loop transformations

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Detecting static algorithms by partial evaluation

PEPM '91 Proceedings of the 1991 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Recognizing strong connectivity in (dynamic) periodic graphs and its relation to integer programming

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Optimization of Computation Time for Systolic Arrays

IEEE Transactions on Computers
Independent Partitioning of Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Data Flow Representation of Iterative Algorithms for Systolic Arrays

IEEE Transactions on Computers
Analysis of free schedule in periodic graphs

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Beyond induction variables

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Strongly polynomial-time and NC algorithms for detecting cycles in periodic graphs

Journal of the ACM (JACM)
Synthesis aspects in the design of efficient processor arrays from affine recurrence equations

Journal of Symbolic Computation - Special issue on automatic programming
Program optimization and parallelization using idioms

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences

ICS '94 Proceedings of the 8th international conference on Supercomputing
The definition of dependence distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Approximation schemes for PSPACE-complete problems for succinct specifications (preliminary version)

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications

IEEE Transactions on Parallel and Distributed Systems
Computing Programs Containing Band Linear Recurrences on Vector Supercomputers

IEEE Transactions on Parallel and Distributed Systems
Finding Space-Time Transformations for Uniform Recurrences viaBranching Parametric Linear Programming

Journal of VLSI Signal Processing Systems
Optimization of the background memory utilization by partitioning

ISSS '97 Proceedings of the 10th international symposium on System synthesis
Designing a Scalable Processor Array for Recurrent Computations

IEEE Transactions on Parallel and Distributed Systems
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Journal of VLSI Signal Processing Systems
Linear programming models for scheduling systems of affine recurrence equations—a comparative study

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
On Time Optimal Implementation of Uniform Recurrences onto Array Processors via Quadratic Programming

Journal of VLSI Signal Processing Systems
Polynomial algorithms for minimum cost paths in periodic graphs

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Alpha du centaur: a prototype environment for the design of parallel regular alorithms

ICS '89 Proceedings of the 3rd international conference on Supercomputing
New Algorithms and Lower Bounds for the Parallel Evaluation of Certain Rational Expressions and Recurrences

Journal of the ACM (JACM)
An Approach to Checking Link Conflicts in the Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Computers
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays

Journal of VLSI Signal Processing Systems
Finding Quadratic Schedules for Affine Recurrence Equations Via Nonsmooth Optimization

Journal of VLSI Signal Processing Systems
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Ultracomputers

ACM Transactions on Programming Languages and Systems (TOPLAS)
The parallel execution of DO loops

Communications of the ACM
Optimizing memory usage in the polyhedral model

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal semi-oblique tiling

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Loop parallelization algorithms

Compiler optimizations for scalable parallel systems
Array dataflow analysis

Compiler optimizations for scalable parallel systems
Systolic Opportunities for Multidimensional Data Streams

IEEE Transactions on Parallel and Distributed Systems
Scheduling reductions on realistic machines

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Optimal tiling for the RNA base pairing problem

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Pattern-matching and rewriting rules for group indexed data structures

Proceedings of the 2002 ACM SIGPLAN workshop on Rule-based programming
Scheduling Functions for Spatiotemporal Mapping of d-Dimensional Algorithms with Homogeneous Dependences on (d-2)-Dimensional Parallel Architectures

Cybernetics and Systems Analysis
Design of Processor Arrays for Reconfigurable Architectures

The Journal of Supercomputing
Processor Array Synthesis from Shift-Variant Deep Nested Do Loops

The Journal of Supercomputing
Parallel Processing for Biomedical Signal Processing

Computer - Special issue on computer-based medical systems
The Generation of a Class of Multipliers: Synthesizing Highly Parallel Algorithms in VLSI

IEEE Transactions on Computers
On Uniformization of Affine Dependence Algorithms

IEEE Transactions on Computers
Multirate VLSI Arrays and Their Synthesis

IEEE Transactions on Computers
Document Image Decoding Using Markov Source Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays

IEEE Transactions on Parallel and Distributed Systems
A Processor-Time-Minimal Systolic Array for Cubical Mesh Algorithms

IEEE Transactions on Parallel and Distributed Systems
A Processor-Time-Minimal Systolic Array for Transitive Closure

IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
Loop Coalescing and Scheduling for Barrier MIMD Architectures

IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking

IEEE Transactions on Parallel and Distributed Systems
Knapsack on VLSI: from Algorithm to Optimal Circuit

IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape

IEEE Transactions on Parallel and Distributed Systems
Pattern-matching and rewriting rules for group indexed data structures

ACM SIGPLAN Notices
Parallel multiplication of a vector by a kronecker product of matrices

Parallel numerical linear algebra
Mapping Techniques for Parallel Evaluation of Chains of Recurrences

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Mapping Linear Recurrences onto Systolic Arrays

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Proving Properties of Multidimensional Recurrences with Application to Regular Parallel Algorithms

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Two-Dimensional Scheduling of Algorithms with Uniform Dependencies

PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Generation of Distributed Loop Control

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Structured Scheduling of Recurrence Equations: Theory and Practice

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Exact Partitioning of Affine Dependence Algorithms

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Localization of Data Transfer in Processor Arrays

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Accretive Rules in Cayley P Systems

WMC-CdeA '02 Revised Papers from the International Workshop on Membrane Computing
Structured scheduling of recurrence equations: theory and practice

Embedded processor design challenges
Exact partitioning of affine dependence algorithms

Embedded processor design challenges
Generation of distributed loop control

Embedded processor design challenges
Hexagonal systolic arrays for matrix multiplication

Highly parallel computaions
An introduction to processor-time-optimal systolic arrays

Highly parallel computaions
A logical framework to prove properties of Alpha programs

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Scheduling in Co-Partitioned Array Architectures

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Determination of the Processor Functionality in the Design of Processor Arrays

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
The decidability of the reachability problem for vector addition systems (Preliminary Version)

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
New algorithms and lower bounds for the parallel evaluation of certain rational expressions

STOC '74 Proceedings of the sixth annual ACM symposium on Theory of computing
Automatic synthesis of systolic arrays from uniform recurrent equations

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Functionality in ASSY system and language of functional programming

PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Techniques for the design of communicating processes

IWSSD '91 Proceedings of the 6th international workshop on Software specification and design
Mapping rectangular mesh algorithms onto asymptotically space-optimal arrays

Journal of Parallel and Distributed Computing
On Scheduling Mesh-Structured Computations for Internet-Based Computing

IEEE Transactions on Computers
Verification of safety properties for parameterized regular systems

ACM Transactions on Embedded Computing Systems (TECS)
A hierarchical design methodology for full-search block matching motion estimation

Multidimensional Systems and Signal Processing
Table design in dynamic programming

Information and Computation
Reducing off-chip memory access via stream-conscious tiling on multimedia applications

International Journal of Parallel Programming
A practical dynamic single assignment transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Sharpness, a tight condition for throughput scalability

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Efficient implementation of nested-loop multimedia algorithms

EURASIP Journal on Applied Signal Processing
Time and Parallel Processor Bounds for Linear Recurrence Systems

IEEE Transactions on Computers
Composition of Loop Modules in the Structural Blanks Approach to Programming with Recurrences: A Task of Synthesis of Nested Loops

Informatica
Sharpness: A Tight Condition for Scalability

SIROCCO '08 Proceedings of the 15th international colloquium on Structural Information and Communication Complexity
Note: Minimization of circuit registers: Retiming revisited

Discrete Applied Mathematics
A reindexing based approach towards mapping of DAG with affine schedules onto parallel embedded systems

Journal of Parallel and Distributed Computing
Spatial Organization of the Chemical Paradigm and the Specification of Autonomic Systems

Software-Intensive Systems and New Computing Paradigms
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Hierarchical Dependency Graphs: Abstraction and Methodology for Mapping Systolic Array Designs to Multicore Processors

PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
A Comparison of Some Theoretical Models of Parallel Computation

IEEE Transactions on Computers
Parallel solution of recurrence problems

IBM Journal of Research and Development
Parallel program schemata

Journal of Computer and System Sciences
On control signals for multi-dimensional time

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Array-OL with delays, a domain specific specification language for multidimensional intensive signal processing

Multidimensional Systems and Signal Processing
Easy problems for grid-structured graphs

FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Multi-dimensional rankings, program termination, and complexity bounds of flowchart programs

SAS'10 Proceedings of the 17th international conference on Static analysis
Geometric scheduling of 2-D UET-UCT uniform dependence loops

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Automatic code generation for distributed memory architectures in the polytope model

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications

Journal of Systems Architecture: the EUROMICRO Journal
Transformation to dynamic single assignment using a simple data flow analysis

APLAS'05 Proceedings of the Third Asian conference on Programming Languages and Systems
Efficient realization of data dependencies in algorithm partitioning under resource constraints

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Generating parallel algorithms for cluster and grid computing

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
The polyhedral model is more widely applicable than you think

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Optimizing I/O for big array analytics

Proceedings of the VLDB Endowment
Scan detection and parallelization in "inherently sequential" nested loop programs

Proceedings of the Tenth International Symposium on Code Generation and Optimization
A direct method for optimal VLSI realization of deeply nested n-D loop problems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.07

Visualization

Abstract

A set equations in the quantities ai(p), where i = 1, 2, · · ·, m and p ranges over a set R of lattice points in n-space, is called a system of uniform recurrence equations if the following property holds: If p and q are in R and w is an integer n-vector, then ai(p) depends directly on aj(p - w) if and only if ai(q) depends directly on aj(q - w). Finite-difference approximations to systems of partial differential equations typically lead to such recurrence equations. The structure of such a system is specified by a dependence graph G having m vertices, in which the directed edges are labeled with integer n-vectors. For certain choices of the set R, necessary and sufficient conditions on G are given for the existence of a schedule to compute all the quantities ai(p) explicitly from their defining equations. Properties of such schedules, such as the degree to which computation can proceed “in parallel,” are characterized. These characterizations depend on a certain iterative decomposition of a dependence graph into subgraphs. Analogous results concerning implicit schedules are also given.