Generalizing parametric timing analysis
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Pfelib: a performance primitives library for embedded vision
EURASIP Journal on Embedded Systems
Model-based design of an embedded vision application: a field report
SPPR'07 Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications
An Optimized Software-Based Implementation of a Census-Based Stereo Matching Algorithm
ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing
Model-based design of an embedded vision application: a field report
SPPRA '07 Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition, and Applications
Distributed real-time stereo matching on smart cameras
Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras
A fast stereo matching algorithm suitable for embedded real-time systems
Computer Vision and Image Understanding
Real-Time Adaptive Background Modeling for Multicore Embedded Systems
Journal of Signal Processing Systems
Cat-tail dma: efficient image data transport for multicore embedded mobile systems
Journal of Mobile Multimedia
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.00 |
Image processing on a Digital Signal Processor (DSP) often requires image data to be stored in external memory, because the amount of fast on-chip memory is usually very limited. Processing images in external memory causes significant performance drawbacks. This paper presents a double buffering method using Direct Memory Access (DMA), called Resource Optimized Slicing (ROS-DMA), which is intended to be used instead of a Level 2 (L2) data cache. The idea of ROS-DMA is to transfer image slices into small intermediate buffers of fast internal memory, where the processing can be completed utilizing the full processing power. Use of DMA enables the data transfers and the processing to be accomplished in parallel. The proposed method has the advantage of a modular implementation, making it easy to re-use components for various image processing operations. The sequence of transfers is organized in such a way that use of processor resources is optimized to achieve the shortest possible execution time. ROS-DMA can yield substantially better performance compared to using L2 cache. Furthermore, we expect that with ROS-DMA it will be easier to obtain reliable and tight Worst Case Execution Times (WCETs). Test runs achieved up to six times faster execution with ROS-DMA compared to using the L2 cache on a C6416 DSP from Texas Instruments.