MPEG Handbook
Stream-Oriented FPGA Computing in the Streams-C High Level Language
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
ACM Transactions on Design Automation of Electronic Systems (TODAES)
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
UML-based multiprocessor SoC design framework
ACM Transactions on Embedded Computing Systems (TECS)
Hardware-Software Codesign of Multimedia Embedded Systems: the PeaCE
RTCSA '06 Proceedings of the 12th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
High-Performance Embedded Computing: Architectures, Applications, and Methodologies
High-Performance Embedded Computing: Architectures, Applications, and Methodologies
A framework for rapid system-level exploration, synthesis, and programming of multimedia MP-SoCs
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Rapid implementation and optimisation of DSP systems on SoPC heterogeneous platforms
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Multidimensional synchronous dataflow
IEEE Transactions on Signal Processing
IEEE Transactions on Signal Processing
IEEE Transactions on Multimedia
On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Hardware synthesis from dataflow graphs of signal processing systems is a growing research area as focus shifts to high level design methodologies. For data intensive systems, dataflow based synthesis can lead to an inefficient usage of memory due to the restrictive nature of synchronous dataflow and its inability to easily model data reuse. This paper explores how dataflow graph changes can be used to drive both the on-chip and off-chip memory organisation and how these memory architectures can be mapped to a hardware implementation. By exploiting the data reuse inherent to many image processing algorithms and by creating memory hierarchies, off-chip memory bandwidth can be reduced by a factor of a thousand from the original dataflow graph level specification of a motion estimation algorithm, with a minimal increase in memory size. This analysis is verified using results gathered from implementation of the motion estimation algorithm on a Xilinx Virtex-4 FPGA, where the delay between the memories and processing elements drops from 14.2 ns down to 1.878 ns through the refinement of the memory architecture. Care must be taken when modeling these algorithms however, as inefficiencies in these models can be easily translated into overuse of hardware resources.