Quadratic zero-one programming based synthesis of application specific data paths
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Resource sharing in hierarchical synthesis
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Layout-driven resource sharing in high-level synthesis
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
A method of automatic data path synthesis
DAC '83 Proceedings of the 20th Design Automation Conference
Computer Organization and Design
Computer Organization and Design
Area-efficient instruction set synthesis for reconfigurable system-on-chip designs
Proceedings of the 41st annual Design Automation Conference
Rapid Embedded Hardware/Software System Generation
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
Applying Resource Sharing Algorithms to ADL-driven Automatic ASIP Implementation
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Automatic application specific floating-point unit generation
Proceedings of the conference on Design, automation and test in Europe
Efficient datapath merging for partially reconfigurable architectures
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Improving Floating-Point Performance in Less Area: Fractured Floating Point Units (FFPUs)
Journal of Signal Processing Systems
Hi-index | 0.00 |
While ASIPs have allowed designers to create processors with custom instructions to target specific applications, floatingpoint units are still instantiated as fixed general-purpose units, which wastes area if not fully utilized. Therefore, there is a need for custom FPUs for embedded systems. The creation of a custom FPU requires the selection of a subset of the full floating-point instruction set and the implementation of this subset in hardware, such that the runtime of the application is minimized. To minimize area, it is desirable to merge the datapaths for each of the floating-point operations, so that redundant hardware is minimized. Floating-point datapaths are complex and contain components with varying bit-widths, so sharing components of different bit-widths is necessary. However, this introduces the problem of bit-alignment, which involves determining how smaller resources should be aligned within larger resources when merged. This is a problem that has been largely neglected in previous work. Thus, this paper presents a novel algorithm for solving the bit-alignment problem, which neatly integrates into the datapath merging process. By solving this bit-alignment problem, automatic datapath merging can be made available for FPU generation. To explore the trade-offs between area and performance, a rapid design space exploration was performed to determine which FP operations should be implemented in hardware rather than emulated. Our results show that more floating-point hardware does not necessarily equate to lower run-time if the additional hardware increases delay. We found that bit-alignment reduced area by an average of 22.5% in our benchmarks, compared to an average of 14.1% without bit-alignment.