Proceedings of the 14th international symposium on Systems synthesis
Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems
IEEE Design & Test
VLSI Architecture for Motion Estimation using the Block-Matching Algorithm
EDTC '96 Proceedings of the 1996 European conference on Design and Test
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Real-Time Motion Estimation and Visualization on Graphics Cards
VIS '04 Proceedings of the conference on Visualization '04
Traffic shaping for an FPGA based SDRAM controller with complex QoS requirements
Proceedings of the 42nd annual Design Automation Conference
An Image Processor for Digital Film
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Evaluation of design alternatives for the 2-D-discrete wavelet transform
IEEE Transactions on Circuits and Systems for Video Technology
FlexWAFE - a high-end real-time stream processing library for FPGAs
Proceedings of the 44th annual Design Automation Conference
A high-end real-time digital film processing reconfigurable platform
EURASIP Journal on Embedded Systems
EURASIP Journal on Embedded Systems
Application development with the FlexWAFE real-time stream processing architecture for FPGAs
ACM Transactions on Embedded Computing Systems (TECS)
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.01 |
This paper presents a multi-board, multi-FPGA hardware/software architecture, for computation intensive, high resolution (2048x2048 pixels), real-time (24 frames per second) digital film processing. It is based on Xilinx Virtex-II Pro FPGAs, large SDRAM memories for multiple frame storage and a PCI express communication network. The architecture reaches record performance running a complex noise reduction algorithm including a 2.5 dimensions DWT and a full 16x16 motion estimation at 24 fps requiring a total of 203 Gops/s net computing performance and a total of 28 Gbit/s DDR-SDRAM frame memory bandwidth. To increase design productivity and yet achieve high clock rates (125MHz), the architecture combines macro component configuration and macro level floorplanning with weak programmability using distributed microcoding. As an example, the core of the bidirectional motion estimation using 2772 CLBs reaching 155 Gop/s (1538 op/pixel) requiring 7 Gbit/s external memory bandwidth was developed in two men-months.