A loop accelerator for low power embedded VLIW processors
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A low power architecture for embedded perception
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Application driven embedded system design: a face recognition case study
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Hi-index | 0.00 |
Recognizing speech, gestures, and visual features are important interface capabilities for future embedded mobile systems. Unfortunately the real-time performance requirements of complex perception applications cannot be met by current embedded processors and often even exceed the capability of high performance microprocessors. The energy budget of current high performance processors is infeasible in the embedded space. The normal approach is to resort to a custom ASIC to meet performance and energy constraints. However ASICs incur expensive and lengthy design cycles. They are so specialized that they are unable to support multiple applications or even evolutionary improvements in a single application. This dissertation introduces a VLIW perception processor that uses a combination of clustered function units, compiler controlled data-flow and compiler controlled clock-gating in conjunction with hardware support for modulo scheduling, address generation units and a scratch-pad memory system to achieve very high performance for perceptual algorithms at low energy consumption. The architecture is evaluated using benchmark algorithms taken from complex speech and visual feature recognition, security, and signal processing domains. Since energy and delay are common design trade-offs, the energy-delay product of a CMOS implementation of the perception processor is compared against ASICs and general purpose processors. Using a combination of Spice simulations, real processor power measurements and architecture simulation it is shown that the perception processor running at 1 GHz clock frequency outperforms a 2.4 GHz Pentium 4 by a factor of 1.75. While delivering this performance it simultaneously achieves 159 times better energy delay product than a low power Intel XScale embedded processor. The perception processor makes sophisticated real-time perception applications possible within an energy budget that is commensurate with the embedded space, a task that is impossible with current embedded processors.