Micro-optimization of floating-point operations
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A Library of Parameterized Floating-Point Modules and Their Use
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Embedded floating-point units in FPGAs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Design and implementation of a modular and portable IEEE 754 compliant floating-point unit
Proceedings of the conference on Design, automation and test in Europe: Designers' forum
Open Source High Performance Floating-Point Modules
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Scientific applications vs. SPEC-FP: a comparison of program behavior
Proceedings of the 20th annual international conference on Supercomputing
Conjoining soft-core FPGA processors
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Rapid application specific floating-point unit generation with bit-alignment
Proceedings of the 45th annual Design Automation Conference
Instruction set extensions for dynamic time warping
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
Embedded systems designers often use fixed-point instead of floating-point due to the performance and area overhead of floating-point units. If the range of floating-point representation is required, the system may use a software-based floating-point library on an integer-only processor to save area--at the cost of much lower performance. Instead, we propose a Fractured Floating Point Unit (FFPU)--a hybrid solution that uses a set of custom hardware instructions to accelerate software-based floating-point emulation. An FFPU is intended as a compromise between software libraries and full FPUs in terms of both area and performance. We present four potential 32-bit FFPU designs for a Nios II soft processor. We compare their performance and area to the baseline Nios II, as well as a Nios II with a complete FPU. We show that an FFPU can improve various floating-point operations, including improving addition and subtraction performance by 24 to 52 percent over the baseline. This performance comes at a resource cost of only an 11 to 29 percent ALM increase, and no increase in DSP blocks.