Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

Authors:
Chris J. Newburn;Byoungro So;Zhenying Liu;Michael McCool;Anwar Ghuloum;Stefanus Du Toit;Zhi Gang Wang;Zhao Hui Du;Yongjian Chen;Gansha Wu;Peng Guo;Zhanglin Liu;Dan Zhang
Affiliations:
Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation;Performance and Productivity Libraries, Software and Services Group, Intel Corporation
Venue:
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2011

Citing 27
Cited 15

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Vector models for data-parallel computing

Vector models for data-parallel computing
Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
Implementation of a portable nested data-parallel language

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiling nested data-parallel programs for shared-memory multiprocessors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Design patterns: elements of reusable object-oriented software

Design patterns: elements of reusable object-oriented software
Programming parallel algorithms

Communications of the ACM
Optimizing ML with run-time code generation

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Models and languages for parallel computation

ACM Computing Surveys (CSUR)
Generative programming: methods, tools, and applications

Generative programming: methods, tools, and applications
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Shader metaprogramming

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Nepal - Nested Data Parallelism in Haskell

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
From patterns to frameworks to parallel programs

Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
Using generative design patterns to generate parallel code for a distributed memory environment

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Pattern-Based Parallel Programming

ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Code Generation in Action

Code Generation in Action
Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming

Parallel Computing
Shader algebra

ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh

Metaprogramming GPUs with Sh
Skeleton-based parallel programming: Functional and parallel semantics in a single shot

Computer Languages, Systems and Structures
GPU Computing: Programming a Massively Parallel Processor

Proceedings of the International Symposium on Code Generation and Optimization
Scalarization on Short Vector Machines

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Patterns for parallel programming

Patterns for parallel programming
Structured parallel programming with deterministic patterns

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Vapor SIMD: Auto-vectorize once, run everywhere

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Decoupling algorithms from schedules for easy optimization of image processing pipelines

ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Vapor SIMD: Auto-vectorize once, run everywhere

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Compiling a high-level language for GPUs: (via language support for architectures and compilers)

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Improving performance of OpenCL on CPUs

CC'12 Proceedings of the 21st international conference on Compiler Construction
Towards high-performance implementations of a custom HPC kernel using ® array building blocks

Facing the Multicore-Challenge II
Parallel programming in Haskell almost for free: an embedding of intel's array building blocks

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Riposte: a trace-driven compiler and parallel VM for vector code in R

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Terra: a multi-stage language for high-performance computing

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

International Journal of High Performance Computing Applications
ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Embrace, defend, extend: a methodology for embedding preexisting DSLs

Proceedings of the 1st annual workshop on Functional programming concepts in domain-specific languages
River trail: a path to parallelism in JavaScript

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer's ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world's software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient core and vector parallelism is essential to enabling the use of multi- and many-core processor architectures. However, hardware architectures and vector ISAs are also shifting and diversifying quickly, making it difficult for a single binary to run well on all possible targets. Because of this, retargetability and dynamic compilation are of growing relevance. This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework. This system focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++. ArBB interoperates with other programming models to help meet the demands we hear from customers for a solution with both greater programmer productivity and good performance. This work makes contributions in language features, compiler architecture, code transformations and optimizations. It presents performance data from the current beta release of ArBB and quantitatively shows the impact of some key analyses, enabling transformations and optimizations for a variety of benchmarks that are of interest to our customers.