Self-timed rings and their application to division
Self-timed rings and their application to division
Digital systems engineering
Single-Track Handshake Signaling with Application to Micropipelines and Handshake Circuits
ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
ASYNC '01 Proceedings of the 7th International Symposium on Asynchronous Circuits and Systems
A Low-Power, High-Speed Implementation of a PowerPC(tm) Microprocessor Vector Extension
ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
The Design of an Asynchronous MIPS R3000 Microprocessor
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Practical Design and Performance Evaluation of Completion Detection Circuits
ICCD '98 Proceedings of the International Conference on Computer Design
Single-Track Asynchronous Pipeline Templates Using 1-of-N Encoding
Proceedings of the conference on Design, automation and test in Europe
A High-Performance Asynchronous FPGA: Test Results
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Scaling, Power and the Future of CMOS
VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits
ASYNC '09 Proceedings of the 2009 15th IEEE Symposium on Asynchronous Circuits and Systems (async 2009)
An Operand-Optimized Asynchronous IEEE 754 Double-Precision Floating-Point Adder
ASYNC '10 Proceedings of the 2010 IEEE Symposium on Asynchronous Circuits and Systems
Hi-index | 0.01 |
We present two novel energy-efficient pipeline templates for high throughput asynchronous circuits. The proposed templates, called N-P and N-Inverter pipelines, use a single-track handshake protocol. There are multiple stages of logic within each pipeline. The proposed techniques minimize handshake overheads associated with input tokens and intermediate logic nodes within a pipeline template. Each template can pack a significant amount of logic in a single stage, while still maintaining a fast cycle time of only 18 transitions. Noise and timing robustness constraints of our pipelined circuits are quantified across all process corners. We present completion detection scheme based on wide NOR gates, which results in significant latency and energy savings especially as the number of outputs increase. To fully quantify all design trade-offs, three separate pipeline implementations of an 8x8-bit Booth-encoded array multiplier are presented. Compared to a standard QDI pipeline implementation, the N-Inverter and N-P pipeline implementations reduced the energy-delay product by 38.5% and 44% respectively. The overall multiplier latency was reduced by 20.2% and 18.7%, while the total transistor width was reduced by 35.6% and 46% with N-Inverter and N-P pipeline templates respectively.