API compilation for image hardware accelerators

Authors:
Fabien Coelho;François Irigoin
Affiliations:
MINES ParisTech, France;MINES ParisTech, France
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Year:
2013

Citing 19
Cited 0

Optimal evaluation of vector expression trees

JCIT Proceedings of the fifth Jerusalem conference on Information technology
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Code Generation for a One-Register Machine

Journal of the ACM (JACM)
Code Generation for Expressions with Common Subexpressions

Journal of the ACM (JACM)
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Parallel Computers Two: Architecture, Programming and Algorithms

Parallel Computers Two: Architecture, Programming and Algorithms
A Code Motion Framework for Global Instruction Scheduling

CC '98 Proceedings of the 7th International Conference on Compiler Construction
Morphological Image Analysis: Principles and Applications

Morphological Image Analysis: Principles and Applications
Using Algebraic Transformations to Optimize Expression Evaluation in Scientific Code

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
Definition and SIMD implementation of a multi-processing architecture approach on FPGA

Proceedings of the conference on Design, automation and test in Europe
Compilation Techniques for Reconfigurable Architectures

Compilation Techniques for Reconfigurable Architectures
Storage requirements for deterministic polynomialtime recognizable languages

Journal of Computer and System Sciences
Array-OL with delays, a domain specific specification language for multidimensional intensive signal processing

Multidimensional Systems and Signal Processing
Lime: a Java-compatible and synthesizable language for heterogeneous architectures

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Decoupling algorithms from schedules for easy optimization of image processing pipelines

ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Diderot: a parallel DSL for image analysis and visualization

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an API-based compilation strategy to optimize image applications, developed using a high-level image processing library, onto three different image processing hardware accelerators. We demonstrate that such a strategy is profitable for both development cost and overall performance, especially as it takes advantage of optimization opportunities across library calls otherwise beyond reach. The library API provides the semantics of the image computations. The three image accelerator targets are quite distinct: the first one uses a vector architecture; the second one presents an SIMD architecture; the last one runs both on GPGPU and multicores through OpenCL. We have adapted standard compilation techniques to perform these compilation and code generation tasks automatically. Our strategy is implemented in PIPS, a source-to-source compiler which greatly reduces the development cost as standard phases are reused and parameterized. We carried out experiments with applications on hardware functional simulators and GPUs. Our contributions include: (1) a general low-cost compilation strategy for image processing applications, based on the semantics provided by library calls, which improves locality by an order of magnitude; (2) specific heuristics to minimize execution time on the target accelerators; (3) numerous experiments that show the effectiveness of our strategies. We also discuss the conditions required to extend this approach to other application domains.