PathFinder: a negotiation-based performance-driven router for FPGAs
FPGA '95 Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays
Using cluster-based logic blocks and timing-driven packing to improve FPGA speed and density
FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
Automatic generation of FPGA routing architectures from high-level descriptions
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
Timing-driven placement for FPGAs
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
Using sparse crossbars within LUT
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Multiplexer restructuring for FPGA implementation cost reduction
Proceedings of the 42nd annual Design Automation Conference
Designing Efficient Input Interconnect Blocks for LUT Clusters Using Counting and Entropy
ACM Transactions on Reconfigurable Technology and Systems (TRETS) - Special edition on the 15th international symposium on FPGAs
Area and delay trade-offs in the circuit and architecture design of FPGAs
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Architectural modifications to enhance the floating-point performance of FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Automated transistor sizing for FPGA architecture exploration
Proceedings of the 45th annual Design Automation Conference
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Flexible multi-mode embedded floating-point unit for field programmable gate arrays
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
FPGA Floating Point Datapath Compiler
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Floating-point FPGA: architecture and modeling
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Synthesis of Floating-Point Addition Clusters on FPGAs Using Carry-Save Arithmetic
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Enhancing the area efficiency of FPGAs with hard circuits using shadow clusters
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The effect of LUT and cluster size on deep-submicron FPGA performance and density
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
Towards simulator-like observability for FPGAs: a virtual overlay network for trace-buffers
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
In floating-point datapaths synthesized on FPGAs, the shifters that perform mantissa alignment and normalization consume a disproportionate number of LUTs. Shifters are implemented using several rows of small multiplexers; unfortunately, multiplexer-based logic structures map poorly onto LUTs. FPGAs, meanwhile, contain a large number of multiplexers in the programmable routing network; these multiplexer are placed under static control of the FPGA's configuration bitstream. In this work, we modify some of the routing multiplexers in the intra-cluster routing network of a CLB in an FPGA to implement shifters for floating-point mantissa alignment and normalization; the number of CLBs required for these operations is reduced by 67%. If shifting is not required, the routing multiplexers that have been modified can be configured to operate as normal routing multiplexers, so no functionality is sacrificed. The area overhead incurred by these modifications is small, and there is no need to modify every routing multiplexer in the FPGA. Experiments show that there is no negative impact in terms of clock frequency or routability for benchmarks that do not use the dynamic multiplexers.