An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
Journal of the ACM (JACM)
Optimal algorithms for parallel polynomial evaluation
SWAT '71 Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971)
Finding Lowest Common Ancestors in Parallel
IEEE Transactions on Computers
Parallel Solutions of Indexed Recurrence Equations
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance Comparison of VLSI Adders Using Logical Effort
PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Verification of Delayed-Reset Domino Circuits Using ATACS
ASYNC '99 Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems
Dynamic CMOS circuit techniques for delay and power reduction in parallel adders
ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
Multilevel Reverse-Carry Adder
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
EUDOXUS: A WWW-based Generator of Reusable Arithmetic Cores
RSP '01 Proceedings of the 12th International Workshop on Rapid System Prototyping
Architectural Considerations for Energy Efficiency
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Low- and Ultra Low-Power Arithmetic Units: Design and Comparison
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Faster optimal parallel prefix circuits: New algorithmic construction
Journal of Parallel and Distributed Computing
An Algorithm for Solving Linear Recurrence Systems on Parallel and Pipelined Machines
IEEE Transactions on Computers
Time and Parallel Processor Bounds for Linear Recurrence Systems
IEEE Transactions on Computers
IEEE Transactions on Computers
Efficient implementation of 3X for radix-8 encoding
Microelectronics Journal
Three-dimensional Integrated Circuit Design
Three-dimensional Integrated Circuit Design
Efficient modulo 2n+1 adder architectures
Integration, the VLSI Journal
Characterizing asynchronous variable latencies through probability distribution functions
Microprocessors & Microsystems
A timing-driven hybrid-compression algorithm for faster Sum-of-Products
CSS '07 Proceedings of the Fifth IASTED International Conference on Circuits, Signals and Systems
Fast modulo 2n+1 multi-operand adders and residue generators
Integration, the VLSI Journal
Parallel solution of recurrence problems
IBM Journal of Research and Development
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
A 270ps 20mW 108-bit End-around Carry Adder for Multiply-Add Fused Floating Point Unit
Journal of Signal Processing Systems
New families of computation-efficient parallel prefix algorithms
WSEAS Transactions on Computers
Multiplication acceleration through twin precision
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Constructive threshold logic addition: a synopsis of the last decade
ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Design and implementation of a high-speed reconfigurable modular arithmetic unit
APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
A multi-level approach to reduce the impact of NBTI on processor functional units
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Constant addition with flagged binary adder architectures
Integration, the VLSI Journal
Algorithms and theory of computation handbook
Voltage scalable high-speed robust hybrid arithmetic units using adaptive clocking
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High-speed arithmetic coder/decoder architectures
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
On unlimited parallelism of DSP arithmetic computations
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE system-on-chip conference
Prenormalization rounding in IEEE floating-point operations using a flagged prefix adder
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A 32-bit carry lookahead adder using dual-path all-n logic
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Comparison of high-performance VLSI adders in the energy-delay space
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Functional and dynamic programming in the design of parallel prefix networks
Journal of Functional Programming
Automatic parallelization via matrix multiplication
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
GPU-efficient recursive filtering and summed-area tables
Proceedings of the 2011 SIGGRAPH Asia Conference
A quick method for energy optimized gate sizing of digital circuits
PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
A new optimized high-speed low-power data-driven dynamic (d3l) 32-bit kogge-stone adder
PATMOS'09 Proceedings of the 19th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Formal proof for a general architecture of hybrid prefix/carry-select adders
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Ultra low-power neural inspired addition: when serial might outperform parallel architectures
IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Power – performance optimization for custom digital circuits
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Fast low-power 64-bit modular hybrid adder
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Circuit design style for energy efficiency: LSDL and compound domino
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
CSD-RNS-based Single Constant Multipliers
Journal of Signal Processing Systems
Scan detection and parallelization in "inherently sequential" nested loop programs
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Mathematical and Computer Modelling: An International Journal
Fast parallel prefix logic circuits for n2n round-robin arbitration
Microelectronics Journal
Area-time efficient multi-modulus adders and their applications
Microprocessors & Microsystems
Synthesis of Adaptable Hybrid Adders for Area Optimization under Timing Constraint
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Area-time efficient end-around inverted carry adders
Integration, the VLSI Journal
Parallel Computation of Adaptive Filtering Algorithms on Multi-Core Systems
Journal of Signal Processing Systems
Towards optimal performance-area trade-off in adders by synthesis of parallel prefix structures
Proceedings of the 50th Annual Design Automation Conference
Practical nonvolatile multilevel-cell phase change memory
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Chosen-IV correlation power analysis on KCipher-2 and a countermeasure
COSADE'13 Proceedings of the 4th international conference on Constructive Side-Channel Analysis and Secure Design
A sound and complete abstraction for reasoning about parallel prefix sums
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Runtime verification of microcontroller binary code
Science of Computer Programming
Hi-index | 15.01 |
An mth-order recurrence problem is defined as the computation of the series x1, x2, ..., XN, where xi = fi(xi-1, ..., xi-m) for some function fi. This paper uses a technique called recursive doubling in an algorithm for solving a large class of recurrence problems on parallel computers such as the Iliac IV. Recursive doubling involves the splitting of the computation of a function into two equally complex subfunctions whose evaluation can be performed simultaneously in two separate processors. Successive splitting of each of these subfunctions spreads the computation over more processors. This algorithm can be applied to any recurrence equation of the form xi = f(bi, g(ai, xi-1)) where f and g are functions that satisfy certain distributive and associative-like properties. Although this recurrence is first order, all linear mth-order recurrence equations can be cast into this form. Suitable applications include linear recurrence equations, polynomial evaluation, several nonlinear problems, the determination of the maximum or minimum of N numbers, and the solution of tridiagonal linear equations. The resulting algorithm computes the entire series x1, ..., xN in time proportional to [log2 N] on a computer with N-fold parallelism. On a serial computer, computation time is proportional to N.