Design of the IBM RISC System/6000 floating-point execution unit
IBM Journal of Research and Development
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The floating-point unit of the PowerPC 603e microprocessor
IBM Journal of Research and Development
Comparison of Single- and Dual-Pass Multiply-Add Fused Floating-Point Units
IEEE Transactions on Computers
IEEE Transactions on Computers
Computer Arithmetic Algorithms
Computer Arithmetic Algorithms
Leading Zero Anticipation and Detection A Comparison of Methods
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Floating-Point Fused Multiply-Add: Reduced Latency for Floating-Point Addition
ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
Variable latency speculative addition: a new paradigm for arithmetic circuit design
Proceedings of the conference on Design, automation and test in Europe
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support
IEEE Transactions on Computers
Hybrid LZA: a near optimal implementation of the leading zero anticipator
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L
IBM Journal of Research and Development
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
Characteristics of workloads using the pipeline programming model
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support
IEEE Transactions on Computers
The IBM Blue Gene/Q Compute Chip
IEEE Micro
The Challenges of Petascale Architectures
Computing in Science and Engineering
Floating-point multiply-add-fused with reduced latency
IEEE Transactions on Computers
Hi-index | 0.00 |
Currently, the most powerful supercomputers can provide tens of petaflops. Future many-core systems are estimated to provide an exaflop. However, the power budget limitation makes these machines still unfeasible and unaffordable. Floating Point Units (FPUs) are critical from both the power consumption and performance points of view of today's microprocessors and supercomputers. Literature offers very different designs. Some of them are focused on increasing performance no matter the penalty, and others on decreasing power at the expense of lower performance. In this article, we propose a novel approach for reducing the power of the FPU without degrading the rest of parameters. Concretely, this power reduction is also accompanied by an area reduction and a performance improvement. Hence, an overall energy gain will be produced. According to our experiments, our proposed unit consumes 17.5%, 23% and 16.5% less energy for single, double and quadruple precision, with an additional 15%, 21.5% and 14.5% delay reduction, respectively. Furthermore, area is also diminished by 4%, 4.5 and 5%.