A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

Authors:
Peter M. Kogge;Harold S. Stone
Affiliations:
Department of Electrical Engineering, Digital Systems Laboratory, Stanford University, Stanford, Calif./ Systems Architecture Department, IBM Corporation, Owego, N.Y. 13827.;Department of Electrical Engineering and the Department of Computer Science, Digital Systems Laboratory, Stanford University, Stanford, Calif.
Venue:
IEEE Transactions on Computers
Year:
1973

Citing 2
Cited 62

An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations

Journal of the ACM (JACM)
Optimal algorithms for parallel polynomial evaluation

SWAT '71 Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971)

Finding Lowest Common Ancestors in Parallel

IEEE Transactions on Computers
Parallel Solutions of Indexed Recurrence Equations

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance Comparison of VLSI Adders Using Logical Effort

PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Verification of Delayed-Reset Domino Circuits Using ATACS

ASYNC '99 Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems
Dynamic CMOS circuit techniques for delay and power reduction in parallel adders

ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
Multilevel Reverse-Carry Adder

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
EUDOXUS: A WWW-based Generator of Reusable Arithmetic Cores

RSP '01 Proceedings of the 12th International Workshop on Rapid System Prototyping
Architectural Considerations for Energy Efficiency

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Low- and Ultra Low-Power Arithmetic Units: Design and Comparison

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Faster optimal parallel prefix circuits: New algorithmic construction

Journal of Parallel and Distributed Computing
An Algorithm for Solving Linear Recurrence Systems on Parallel and Pipelined Machines

IEEE Transactions on Computers
Time and Parallel Processor Bounds for Linear Recurrence Systems

IEEE Transactions on Computers
A Direct Approach to the Parallel Evaluation of Rational Expressions with a Small Number of Processors

IEEE Transactions on Computers
Efficient implementation of 3X for radix-8 encoding

Microelectronics Journal
Three-dimensional Integrated Circuit Design

Three-dimensional Integrated Circuit Design
Efficient modulo 2n+1 adder architectures

Integration, the VLSI Journal
Characterizing asynchronous variable latencies through probability distribution functions

Microprocessors & Microsystems
A timing-driven hybrid-compression algorithm for faster Sum-of-Products

CSS '07 Proceedings of the Fifth IASTED International Conference on Circuits, Signals and Systems
Fast modulo 2n+1 multi-operand adders and residue generators

Integration, the VLSI Journal
Parallel solution of recurrence problems

IBM Journal of Research and Development
Novel modulo 2n+1 subtractors

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
A 270ps 20mW 108-bit End-around Carry Adder for Multiply-Add Fused Floating Point Unit

Journal of Signal Processing Systems
New families of computation-efficient parallel prefix algorithms

WSEAS Transactions on Computers
Multiplication acceleration through twin precision

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Constructive threshold logic addition: a synopsis of the last decade

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Design and implementation of a high-speed reconfigurable modular arithmetic unit

APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
A multi-level approach to reduce the impact of NBTI on processor functional units

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Constant addition with flagged binary adder architectures

Integration, the VLSI Journal
Parallel algorithms

Algorithms and theory of computation handbook
Voltage scalable high-speed robust hybrid arithmetic units using adaptive clocking

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High-speed arithmetic coder/decoder architectures

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
On unlimited parallelism of DSP arithmetic computations

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
A high-speed, energy-efficient two-cycle multiply-accumulate (MAC) architecture and Its application to a double-throughput MAC unit

IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE system-on-chip conference
Prenormalization rounding in IEEE floating-point operations using a flagged prefix adder

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A 32-bit carry lookahead adder using dual-path all-n logic

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Comparison of high-performance VLSI adders in the energy-delay space

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Functional and dynamic programming in the design of parallel prefix networks

Journal of Functional Programming
Automatic parallelization via matrix multiplication

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
GPU-efficient recursive filtering and summed-area tables

Proceedings of the 2011 SIGGRAPH Asia Conference
A quick method for energy optimized gate sizing of digital circuits

PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
A new optimized high-speed low-power data-driven dynamic (d3l) 32-bit kogge-stone adder

PATMOS'09 Proceedings of the 19th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Formal proof for a general architecture of hybrid prefix/carry-select adders

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Ultra low-power neural inspired addition: when serial might outperform parallel architectures

IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Power – performance optimization for custom digital circuits

PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Fast low-power 64-bit modular hybrid adder

PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Circuit design style for energy efficiency: LSDL and compound domino

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
CSD-RNS-based Single Constant Multipliers

Journal of Signal Processing Systems
Scan detection and parallelization in "inherently sequential" nested loop programs

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Formal proof of prefix adders

Mathematical and Computer Modelling: An International Journal
Fast parallel prefix logic circuits for n2n round-robin arbitration

Microelectronics Journal
Area-time efficient multi-modulus adders and their applications

Microprocessors & Microsystems
Synthesis of Adaptable Hybrid Adders for Area Optimization under Timing Constraint

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Area-time efficient end-around inverted carry adders

Integration, the VLSI Journal
Parallel Computation of Adaptive Filtering Algorithms on Multi-Core Systems

Journal of Signal Processing Systems
Towards optimal performance-area trade-off in adders by synthesis of parallel prefix structures

Proceedings of the 50th Annual Design Automation Conference
Practical nonvolatile multilevel-cell phase change memory

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Chosen-IV correlation power analysis on KCipher-2 and a countermeasure

COSADE'13 Proceedings of the 4th international conference on Constructive Side-Channel Analysis and Secure Design
A sound and complete abstraction for reasoning about parallel prefix sums

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Runtime verification of microcontroller binary code

Science of Computer Programming
Implementation of a low power 16-bit radix-4 pipelined SRT divider using a modified Split-Path Data Driven Dynamic Logic (SPD3L) structure

Microelectronics Journal
FPGA fault tolerant arithmetic logic: a case study using parallel-prefix adders

VLSI Design

Quantified Score

Hi-index	15.01

Visualization

Abstract

An mth-order recurrence problem is defined as the computation of the series x1, x2, ..., XN, where xi = fi(xi-1, ..., xi-m) for some function fi. This paper uses a technique called recursive doubling in an algorithm for solving a large class of recurrence problems on parallel computers such as the Iliac IV. Recursive doubling involves the splitting of the computation of a function into two equally complex subfunctions whose evaluation can be performed simultaneously in two separate processors. Successive splitting of each of these subfunctions spreads the computation over more processors. This algorithm can be applied to any recurrence equation of the form xi = f(bi, g(ai, xi-1)) where f and g are functions that satisfy certain distributive and associative-like properties. Although this recurrence is first order, all linear mth-order recurrence equations can be cast into this form. Suitable applications include linear recurrence equations, polynomial evaluation, several nonlinear problems, the determination of the maximum or minimum of N numbers, and the solution of tridiagonal linear equations. The resulting algorithm computes the entire series x1, ..., xN in time proportional to [log2 N] on a computer with N-fold parallelism. On a serial computer, computation time is proportional to N.