Convolutional networks for images, speech, and time series
The handbook of brain theory and neural networks
Detecting Faces in Images: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence
Embedded Hardware Face Detection
VLSID '04 Proceedings of the 17th International Conference on VLSI Design
A Real-Time Multi Face Detection Technique Using Positive-Negative Lines-of-Face Template
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
An architectural level design methodology for embedded face detection
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Real-time face detection and lip feature extraction using field-programmable gate arrays
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Platform-based design from parallel C specifications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An AER spike-processing filter simulator and automatic VHDL generator based on cellular automata
IWANN'11 Proceedings of the 11th international conference on Artificial neural networks conference on Advances in computational intelligence - Volume Part I
AER spiking neuron computation on GPUs: the Frame-to-AER generation
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Hi-index | 0.00 |
We describe a High-Level Synthesis implementation of a parallel architecture for face detection. The chosen face detection method is the well-known Convolutional Face Finder (CFF) algorithm, which consists of a pipeline of convolution operations. We rely on dataflow modelling of the algorithm and we use a high-level synthesis tool in order to specify the local dataflows of our Processing Element (PE), by describing in C language inter-PE communication, fine scheduling of the successive convolutions, and memory distribution and bandwidth. Using this approach, we explore several implementation alternatives in order to find a compromise between processing speed and area of the PE. We then build a parallel architecture composed of a PE ring and a FIFO memory, which constitutes a generic architecture capable of processing images of different sizes. A ring of 25 PEs running at 80MHz is able to process 127 QVGA images per second or 35 VGA images per second.