Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Interconnect prediction for programmable logic devices
Proceedings of the 2001 international workshop on System-level interconnect prediction
The stratixπ routing and logic architecture
FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
A detailed power model for field-programmable gate arrays
ACM Transactions on Design Automation of Electronic Systems (TODAES)
On a Pin Versus Block Relationship For Partitions of Logic Graphs
IEEE Transactions on Computers
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Design of minimum and uniform bipartites for optimum connection blocks of FPGA
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Power modeling and characteristics of field programmable gate arrays
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Location, location, location: the role of spatial locality in asymptotic energy minimization
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
When are FPGAs more energy efficient than processors? This question is complicated by technology factors and the wide range of application characteristics that can be exploited to minimize energy. Using a wire-dominated energy model to estimate the absolute energy required for programmable computations, we determine when spatially organized programmable computations (FPGAs) require less energy than temporally organized programmable computations (processors). The point of crossover will depend on the metal layers available, the locality, the SIMD wordwidth regularity, and the compactness of the instructions. When the Rent Exponent, p, is less than 0.7, the spatial design is always more energy efficient. When p=0.8, the technology offers 8-metal layers for routing, and data can be organized into 16b words and processed in tight loops of no more than 128 instructions, the temporal design uses less energy when the number of LUTs is greater than 64K. We further show that heterogeneous multicontext architectures can use even less energy than the p=0.8, 16b word temporal case.