Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Design methodology of ultra low-power MPEG4 codec core exploiting voltage scaling techniques
DAC '98 Proceedings of the 35th annual Design Automation Conference
Closed-loop adaptive voltage scaling controller for standard-cell ASICs
Proceedings of the 2002 international symposium on Low power electronics and design
Cycle-time aware architecture synthesis of custom hardware accelerators
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies
VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Static strands: safely collapsing dependence chains for increasing embedded power efficiency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Error-resilient low-power DSP via path-delay shaping
Proceedings of the 48th Design Automation Conference
Hi-index | 0.00 |
Hardware accelerators are common in embedded systems that have high performance requirements but must still operate within stringent energy constraints. To facilitate short time-to-market and reduced non-recurring engineering costs, automatic systems that can rapidly generate hardware bearing both power and performance in mind are extremely attractive. This paper proposes the BLADES (Better-than-worst-case Loop Accelerator Design) system for automatically designing self-tuning hardware accelerators that dynamically select their best operating frequency and voltage based on environmental conditions, silicon variation, and input data characteristics. Errors in operation are detected by Razor flip-flops, and recovery is initiated. The architecture efficiently supports detection, rollback, and recovery to provide a highly adaptable and configurable loop accelerator. The overhead of deploying Razor flip-flops is significantly reduced by automatically chaining primitive computation operations together. Results on a range of loop accelerators show average energy savings of 32% gained by voltage scaling below the nominal supply voltage.