Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

Authors:
Jonathan Ragan-Kelley;Connelly Barnes;Andrew Adams;Sylvain Paris;Frédo Durand;Saman Amarasinghe
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA, USA;Adobe, Cambridge, MA, USA;Massachusetts Institute of Technology, Cambridge, MA, USA;Adobe, Cambridge, MA, USA;Massachusetts Institute of Technology, Cambridge, MA, USA;Massachusetts Institute of Technology, Cambridge, MA, USA
Venue:
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Year:
2013

Citing 20
Cited 4

A parallelizing compiler for distributed memory parallel computers

A parallelizing compiler for distributed memory parallel computers
Automatic mapping of large signal processing systems to a parallel machine

Automatic mapping of large signal processing systems to a parallel machine
A model for efficient and flexible image computing

SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
A stream compiler for communication-exposed architectures

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Cache oblivious stencil computations

Proceedings of the 19th annual international conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Real-time edge-aware image processing with the bilateral grid

ACM SIGGRAPH 2007 papers
High-performance SIMT code generation in an active visual effects library

Proceedings of the 6th ACM conference on Computing frontiers
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
The Frankencamera: an experimental platform for computational photography

ACM SIGGRAPH 2010 papers
Bilateral Filtering

Bilateral Filtering
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Local Laplacian filters: edge-aware image processing with a Laplacian pyramid

ACM SIGGRAPH 2011 papers
The pochoir stencil compiler

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Decoupling algorithms from schedules for easy optimization of image processing pipelines

ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
High-performance code generation for stencil computations on GPU architectures

Proceedings of the 26th ACM international conference on Supercomputing

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Weir: a streaming language for performance analysis

Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
Hybrid Hexagonal/Classical Tiling for GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values. We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.