Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Low Power Embedded Software Optimization Using Symbolic Algebra
Proceedings of the conference on Design, automation and test in Europe
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Convex Optimization
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Simultaneous optimization of memory configuration and code allocation for low power embedded systems
Proceedings of the 18th ACM Great Lakes symposium on VLSI
An Overview of Low-Power Techniques for Field-Programmable Gate Arrays
AHS '08 Proceedings of the 2008 NASA/ESA Conference on Adaptive Hardware and Systems
Map-reduce as a Programming Model for Custom Computing Machines
FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Outer loop pipelining for application specific datapaths in FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Parallelism Level Impact on Energy Consumption in Reconfigurable Devices
ACM SIGARCH Computer Architecture News
Heterogeneous systems for energy efficient scientific computing
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Memory partitioning for multidimensional arrays in high-level synthesis
Proceedings of the 50th Annual Design Automation Conference
Theory and algorithm for generalized memory partitioning in high-level synthesis
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Hi-index | 0.00 |
Starting from sequential programs, we present an approach combining data reuse, multi-level MapReduce, and pipelining to automatically find the most power-efficient designs that meet speed and area constraints in the design space on Field-Programmable Gate Arrays (FPGAs). This combined approach enables trade-offs in power, speed and area: we show 63% reduction in power can be achieved with 27% increase in execution time. Compared to the sequential designs, our approach yields designs with up to 158 times reduction in execution time. Moreover, for a given execution time, our combined approach generates designs using up to 1.4 times less power than those produced by the same optimizations applied separately and can also find solutions missed by separating the optimizations.