Fine-grained vs. coarse-grained shift-and-add arithmetic in FPGAs (abstract only)

Authors:
Julien Lamoureux;Scott Miller;Mihai Sima
Affiliations:
University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada
Venue:
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Year:
2010

Citing 10
Cited 0

A survey of CORDIC algorithms for FPGA based computers

FPGA '98 Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays
Radix-4 Vectoring CORDIC Algorithm and Architectures

Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Computer Arithmetic Algorithms

Computer Arithmetic Algorithms
Architecture and CAD for Deep-Submicron FPGAs

Architecture and CAD for Deep-Submicron FPGAs
Evaluation of CORDIC Algorithms for FPGA Design

Journal of VLSI Signal Processing Systems
RaPiD - Reconfigurable Pipelined Datapath

FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
Elementary Functions: Algorithms and Implementation

Elementary Functions: Algorithms and Implementation
A Scalable Configurable Architecture for Advanced Wireless Communication Algorithms

Journal of VLSI Signal Processing Systems
Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor

HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study compares the speed, area, and latency of shift-and-add arithmetic implemented within fine-grained FPGA resources and within a proposed coarse-grained embedded block for FPGAs. It begins by optimizing the mapping of various shift-and-add architectures within the fine-grained resources of a commercial FPGA to determine which provides the best area, delay, and latency for various word-lengths. It then proposes a new coarse-grained block that supports 16, 32, and 64-bit shift-and-add arithmetic and finally compares coarse-grained implementations to the best fine-grained implementations. Our results show that the coarse-grain implementations are between 15 and 47 times smaller and 5 to 18 times faster, depending on the implementation.