Wordwidth, instructions, looping, and virtualization: the role of sharing in absolute energy minimization

Authors:
André DeHon
Affiliations:
University of Pennsylvania, Philadelphia, PA, USA
Venue:
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Year:
2014

Citing 9
Cited 0

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Interconnect prediction for programmable logic devices

Proceedings of the 2001 international workshop on System-level interconnect prediction
The stratixπ routing and logic architecture

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
A detailed power model for field-programmable gate arrays

ACM Transactions on Design Automation of Electronic Systems (TODAES)
On a Pin Versus Block Relationship For Partitions of Logic Graphs

IEEE Transactions on Computers
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Design of minimum and uniform bipartites for optimum connection blocks of FPGA

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Power modeling and characteristics of field programmable gate arrays

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Location, location, location: the role of spatial locality in asymptotic energy minimization

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

When are FPGAs more energy efficient than processors? This question is complicated by technology factors and the wide range of application characteristics that can be exploited to minimize energy. Using a wire-dominated energy model to estimate the absolute energy required for programmable computations, we determine when spatially organized programmable computations (FPGAs) require less energy than temporally organized programmable computations (processors). The point of crossover will depend on the metal layers available, the locality, the SIMD wordwidth regularity, and the compactness of the instructions. When the Rent Exponent, p, is less than 0.7, the spatial design is always more energy efficient. When p=0.8, the technology offers 8-metal layers for routing, and data can be organized into 16b words and processed in tight loops of no more than 128 instructions, the temporal design uses less energy when the number of LUTs is greater than 64K. We further show that heterogeneous multicontext architectures can use even less energy than the p=0.8, 16b word temporal case.