High level transforms for SIMD and low-level computer vision algorithms

Authors:
Lionel Lacassagne;Daniel Etiemble;Ali Hassan Zahraee;Alain Dominguez;Pascal Vezolle
Affiliations:
University Paris Sud, Orsay, France;University Paris Sud, Orsay, France;University Paris Sud, Orsay, France;Intel France, Paris, France;IBM France, Montpellier, France
Venue:
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Year:
2014

Citing 11
Cited 0

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Mulitdimensional Streams Rooted in Dataflow

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel Computing Experiences with CUDA

IEEE Micro
Direct N-body Kernels for Multicore Platforms

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Can traditional programming bridge the Ninja performance gap for parallel computing applications?

Proceedings of the 39th Annual International Symposium on Computer Architecture
Boost.SIMD: generic programming for portable SIMDization

Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a review of algorithmic transforms called High Level Transforms for IBM, Intel and ARM SIMD multicore processors to accelerate the implementation of low level image processing algorithms. We show that these optimizations provide a significant acceleration. A first evaluation of 512-bit SIMD Xeon- Phi is also presented. We focus on the point that the combination of optimizations leading to the best execution time cannot be predicted, and thus, systematic benchmarking is mandatory. Once the best configuration is found for each architecture, a comparison of these performances is presented. The Harris points detection operator is selected as being representative of low level image processing and computer vision algorithms. Being composed of five convolutions, it is more complex than a simple filter and enables more opportunities to combine optimizations. The presented work can scale across a wide range of codes using 2D stencils and convolutions.