A Fast, Efficient Parallel-Acting Method of Generating Functions Defined by Power Series, Including Logarithm, Exponential, and Sine, Cosine

Authors:
David M. Mandelbaum;Stefanie G. Mandelbaum
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1996

Citing 13
Cited 5

Generation of a Precise Binary Logarithm with Difference Grouping Programmable Logic Array

IEEE Transactions on Computers
A Systematic Method for Division with High Average Bit Skipping

IEEE Transactions on Computers
Table-driven implementation of the logarithm function in IEEE floating-point arithmetic

ACM Transactions on Mathematical Software (TOMS)
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Cost-efficient high-radix division

Journal of VLSI Signal Processing Systems - Special issue: computer arithmetic
Fast Division Using Accurate Quotient Approximations to Reduce the Number of Iterations

IEEE Transactions on Computers - Special issue on computer arithmetic
High-radix algorithms for high-order arithmetic operations

High-radix algorithms for high-order arithmetic operations
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Computer Arithmetic: Principles, Architecture and Design

Computer Arithmetic: Principles, Architecture and Design
Error Coding for Arithmetic Processors

Error Coding for Arithmetic Processors
Combinatorial Algorithms: For Computers and Hard Calculators

Combinatorial Algorithms: For Computers and Hard Calculators
Some Results on a SRT Type Division Scheme

IEEE Transactions on Computers
Approximations for Digital Computers

Approximations for Digital Computers

Hardware Starting Approximation Method and Its Application to the Square Root Operation

IEEE Transactions on Computers
Technology Trends and Adaptive Computing

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
FPGA-based System for Real-Time Video Texture Analysis

Journal of Signal Processing Systems
Adaptable, Fast, Area-Efficient Architecture for Logarithm Approximation with Arbitrary Accuracy on FPGA

Journal of Signal Processing Systems
Multi-Gb/s LDPC code design and implementation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fundamental parallel procedure of implementing certain algorithms is by means of trees and arrays, [1]. A method of generating any function defined by a power series in a fast, efficient parallel-acting manner using trees and arrays is described. The power series considered can be written as f(Y) = a0 + a1Y + a2Y2 + ... where Y = v1x + v2x2 + ... + vkxk, vi = (0, 1), is a binary fraction when x = 陆. The power series must be expanded into individual terms cxi. These terms are then transformed into weighted binary terms. Two methods are given to obtain all the individual terms (including coefficients) associated with each power of x. The hardware required for implementation is a tree similar to a Wallace or Dadda tree used for parallel multiplication of two binary numbers. Despite the multiplicity of terms required, Boolean logic methods reduce the tree dimensions in many cases so that the total tree required is smaller than an existing multiplier tree. In that case, Schwarz and Flynn, [13], [15], have shown that the required tree can be superimposed on the existing multiplier tree in a multiplexed manner with relatively little increase in hardware. The generation of the logarithmic function is described in detail. Comparisons with other methods are made for the case of 11 bit accuracy of the logarithm. Using a figure of merit of latency times area (number of transistors), estimates show that the superposition scheme gives the best (smallest) figure of merit. For 11 bit accuracy, the superposition scheme requires only about 480 additional gates to be superimposed upon a 41 bit or larger multiplier, and the speed of operation is that of the multiplier.