Towards a theory of cache-efficient algorithms
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Loop tiling for parallelism
Obstacle avoidance and navigation in the real world by a seeing robot rover
Obstacle avoidance and navigation in the real world by a seeing robot rover
Scientific computing Kernels on the cell processor
International Journal of Parallel Programming
QR factorization for the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Hi-index | 0.00 |
Real-time implementations of corner detection is crucial as it is a key ingredient for other image processing kernels like pattern recognition and motion detection. Indeed, motion detection requires the analysis of a continuous flow of images, thus a real-time processing implies the use of highly optimized subroutines. We consider a tiled implementation of the Harris corner detection algorithm on the CELL processor. The algorithm is a chain of local operators applied to each pixel and its periphery. Such a special memory access pattern clearly exacerbates on the hierarchy transition penalty. In order to reduce the consequent time overhead, tiling is a commonly considered way. When it comes to image processing filters, incoming tiles are overdimensioned to include their neighborhood, necessary to update border pixels. As the volume of "extra data" depends on the tile shape, we need to find a good tiling strategy. On the CELL, such investigation is not directly possible with native DMA routines. We overcome the problem by enhancing the DMA mechanism to operate with non conventional requests. Based on this extension, we proceed with experiments on the CELL with a wide range of tile sizes and shapes, thus trying to confirm our intuition on the optimal configuration.