The arithmetic cube

Authors:
R. M. Owens;M. J. Irwin
Affiliations:
The Pennsylvania State Univ., University Park, PA;The Pennsylvania State Univ., University Park, PA
Venue:
IEEE Transactions on Computers
Year:
1987

Citing 11
Cited 5

Digit-Pipelined Arnthmetic as Illustrated by the Paste-Up System: A Tutorial

Computer
The cube-connected cycles: a versatile network for parallel computation

Communications of the ACM
Fast Transforms: Algorithms, Analyses, Applications

Fast Transforms: Algorithms, Analyses, Applications
A model of computation for VLSI with related complexity results

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
VLSI Implementation of Digital Fourier Transforms, Final Report

VLSI Implementation of Digital Fourier Transforms, Final Report
Bitonic Sort on a Mesh-Connected Parallel Computer

IEEE Transactions on Computers
Parallel Processing with the Perfect Shuffle

IEEE Transactions on Computers
A Mesh-Connected Area-Time Optimal VLSI Multiplier of Large Integers

IEEE Transactions on Computers
Two VLSI Structures for the Discrete Fourier Transform

IEEE Transactions on Computers
An architecture for a VLSI FFT processor

Integration, the VLSI Journal
VLSI Sorting with Reduced Hardware

IEEE Transactions on Computers

An overview of the Penn State design system

DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
Being Stingy with Multipliers

IEEE Transactions on Computers
DECOMPOSER: a synthesizer for systolic systems

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
An Orthogonal Time-Frequency Extraction Approach to 2D Systolic Architecture for 1D DFT Computation

Journal of VLSI Signal Processing Systems
Architectural design of array processors for multi-dimensional discrete Fourier transform

Highly parallel computaions

Quantified Score

Hi-index	14.98

Visualization

Abstract

We present the design of a VLSI processor which can be programmed to compute the discrete Fourier transform of a sequence of n points and which achieves the theoretical AT2 lower bound of 驴(n2) for n 驴 n where n is an infinite set. Furthermore, since the set n is also sufficiently dense, the processor achieves for any n the theoretical AT2 lower bound of 驴(n2) for computing the cyclic convolution of two sequences of n points. Uniquely, our design achieves this bound without the use of data shuffling or long wires. Also, the processor uses only approximately 驴n multipliers, while many other designs need 驴(n) multipliers to achieve the same time bounds. Since multipliers are usually much larger than adders, the processor presented in this paper should be smaller. The design also features layout regularity, minimal control, and nearest neighbor interconnect of arithmetic cells of a few different types. These characteristics make it an ideal candidate for VLSI implementation.