Systolic Super Summation

Authors:
Peter R. Cappello;Willard L. Miranker
Affiliations:
Princeton University, Princeton, NJ;New York University, New York, NY
Venue:
IEEE Transactions on Computers
Year:
1988

Citing 6
Cited 2

Features of a hardware implementation of an optimal arithmetic

Proc. of the symposium on A new approach to scientific computation
The arithmetic of the digital computer: A new approach

SIAM Review
The Area-Time Complexity of Binary Multiplication

Journal of the ACM (JACM)
Computer Arithmetic in Theory and Practice

Computer Arithmetic in Theory and Practice
Area-time complexity for VLSI

STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
A proposed standard for binary floating point arthmetic

ACM SIGNUM Newsletter

Systolic Super Summation with Reduced Hardware

IEEE Transactions on Computers
Validated Roundings of Dot Products by Sticky Accumulation

IEEE Transactions on Computers

Quantified Score

Hi-index	14.99

Visualization

Abstract

A principal limitation in accuracy for scientific computation performed with floating-point arithmetic is due to the computation of repeated sums, such as those that arise in inner products. A systolic super summer of cellular design is proposed for the high-throughput performance of repeated sums of floating-point numbers. The apparatus receives pipelined inputs of streams of summands from one or many sources. The floating-point summands are converted into a fixed-point form by a sieve-like pipelined cellular packet-switching device with signal combining. The emerging fixed-point numbers are then summed in a corresponding network of extremely long accumulators (i.e., super accumulators). At the cell level, the design uses a synchronous model of VLSI. The amount of time the apparatus needs to compute an entire sum depends on the values of summands; at this architectural level, the design is asynchronous. The throughput per unit area of hardware approaches that of a tree network, but without the long wire and signal propagation delay that are intrinsic to tree networks.