Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Computer arithmetic: algorithms and hardware designs
Computer arithmetic: algorithms and hardware designs
Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs
Proceedings of the conference on Design, automation and test in Europe
Unifying Bit-Width Optimisation for Fixed-Point and Floating-Point Designs
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Telecom: Edholm's law of bandwidth
IEEE Spectrum
A Speculative Control Scheme for an Energy-Efficient Banked Register File
IEEE Transactions on Computers
Tunable Wordlength Architecture for a Low Power Wireless OFDM Demodulator
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Very wide register: an asymmetric register file organization for low power embedded processors
Proceedings of the conference on Design, automation and test in Europe
Fixed-point configurable hardware components
EURASIP Journal on Embedded Systems
Distributed Loop Controller for Multithreading in Unithreaded ILP Architectures
IEEE Transactions on Computers
Optimizing power using transformations
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
Advanced handheld applications are demanding for implementations of higher energy efficiency and higher performance. In typical implementations, the finite precision information is only known after fixed-point refinement, once the data-flow has been frozen. Instead, in this paper we suggest the propagation of finite precision information to drive data-flow transformations in order to achieve a higher mapping efficiency. Then, provided a flexible architecture with low run-time switching overhead, the data-flow under execution can opportunistically be tuned to provide the instantaneous computational accuracy required by the application. Thereby, the average number of operations and the precision of those is minimized. This principle is demonstrated with the implementation of the 128-point FFT present in a WLAN receiver. Compared to a conventional implementation, a reduction of 49% to 65% of the number of cycles can be achieved depending on conditions external to the receiver.