Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
Arithmetic for an SVD processor
Journal of Parallel and Distributed Computing - Parallelism in Computer Arithmetic
Generalized Signed-Digit Number Systems: A Unifying Framework for Redundant Number Representations
IEEE Transactions on Computers
Redundant and On-Line CORDIC: Application to Matrix Triangularization and SVD
IEEE Transactions on Computers
Computer arithmetic algorithms
Computer arithmetic algorithms
Redundant CORDIC Methods with a Constant Scale Factor for Sine and Cosine Computation
IEEE Transactions on Computers
The CORDIC Algorithm: New Results for Fast VLSI Implementation
IEEE Transactions on Computers
Comments on Duprat and Muller's Branching CORDIC Paper
IEEE Transactions on Computers
Fast CORDIC Algorithm Based on a New Recoding Scheme for Rotation Angles and Variable Scale Factors
Journal of VLSI Signal Processing Systems
P-CORDIC: a precomputation based rotation CORDIC algorithm
EURASIP Journal on Applied Signal Processing
A Parallel Double-Step CORDIC Algorithm for Digital Down Converter
CNSR '09 Proceedings of the 2009 Seventh Annual Communication Networks and Services Research Conference
CORDIC architectures: a survey
VLSI Design
VLSI architecture for low latency radix-4 CORDIC
Computers and Electrical Engineering
Hi-index | 14.98 |
Duprat and Muller [1] introduced the ingenious "Branching CORDIC" algorithm. It enables a fast implementation of CORDIC algorithm using signed digits and requires a constant normalization factor. The speedup is achieved by performing two basic CORDIC rotations in parallel in two separate modules. In their method, both modules perform identical computation except when the algorithm is in a "branching" [1]. We have improved the algorithm and show that it is possible to perform two circular mode rotations in a single step, with little additional hardware. In our method, both modules perform distinct computations at each step which leads to a better utilization of the hardware and the possibility of further speedup over the original method. Architectures for VLSI implementation of our algorithm are discussed.