Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
MOVE: a framework for high-performance processor design
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Neural Network-Based Face Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology
ICS '99 Proceedings of the 13th international conference on Supercomputing
Adapting software pipelining for reconfigurable computing
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Power-aware modulo scheduling for high-performance VLIW processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
A super-scheduler for embedded reconfigurable systems
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
A low-power accelerator for the SPHINX 3 speech recognition system
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A loop accelerator for low power embedded VLIW processors
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
The perception processor
Some fast speech processing algorithms using AltiVec technology
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04
A cache design for high performance embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
StreamRay: a stream filtering architecture for coherent ray tracing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization
Engineering Applications of Artificial Intelligence
An energy and bandwidth efficient ray tracing architecture
Proceedings of the 5th High-Performance Graphics Conference
Hi-index | 0.00 |
Recognizing speech, gestures, and visual features are important interface capabilities for future embedded mobile systems. Unfortunately, the real-time performance requirements of complex perception applications cannot be met by current embedded processors and often even exceed the performance of high performance microprocessors whose energy consumption far exceeds embedded energy budgets. Though custom ASICs provide a solution to this problem, they incur expensive and lengthy design cycles and are inflexible. This paper introduces a VLIW perception processor which uses a combination of clustered function units, compiler controlled dataflow and compiler controlled clock-gating in conjunction with a scratch-pad memory system to achieve high performance for perceptual algorithms at low energy consumption. The architecture is evaluated using ten benchmark applications taken from complex speech and visual feature recognition, security, and signal processing domains. The energy-delay product of a 0.13μ implementation of this architecture is compared against ASICs and general purpose processors. Using a combination of Spice simulations and real processor power measurements, we show that the cluster running at 1 GHz clock frequency outperforms a 2.4 GHz Pentium 4 by a factor of 1.75 while simultaneously achieving 159 times better energy delay product than a low power Intel XScale embedded processor.