Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
ACM Computing Surveys (CSUR)
Real-Time Imaging - Special issue on software engineering
Examples of Low-Level Computer Vision on Media Processors
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops - Volume 03
ROS-DMA: A DMA Double Buffering Method for Embedded Image Processing with Resource Optimized Slicing
RTAS '06 Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
Embedded Image Processing on the TMS320C6000 DSP: Examples in Code Composer Studio and MATLAB
Embedded Image Processing on the TMS320C6000 DSP: Examples in Code Composer Studio and MATLAB
Image Processing, Analysis, and Machine Vision
Image Processing, Analysis, and Machine Vision
An Optimized Software-Based Implementation of a Census-Based Stereo Matching Algorithm
ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing
Distributed real-time stereo matching on smart cameras
Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras
A fast stereo matching algorithm suitable for embedded real-time systems
Computer Vision and Image Understanding
Hi-index | 0.00 |
This paper presents our work on PfeLib--a high performance software library for image processing and computer vision algorithms for an embedded system. The main target platform for PfeLib is the TMS320C6000 series of digital signal processors (DSPs) from Texas instruments. PfeLib contains several new approaches for problems that are typical when developing software for embedded systems. We propose a method for image data transfer from a development host (PC) to an embedded system for test and verification. This enables step-by-step performance optimizations directly on the target platform. An optimization procedure is described that illustrates our approach for obtaining the best possible DSP performance with a reasonable development effort. Speedup improvement factors of up to 16 were achieved. Also, the problem of the limited on-chip memory on DSPs is addressed by a novel double buffering method using direct memory access (DMA), called resource optimized slicing (ROS-DMA). ROS-DMA is intended to be used instead of L2 cache and it is a core component of PfeLib--it achieves up to six times faster image processing as compared to using L2 cache.