Communications of the ACM - Special issue on parallelism
Vector models for data-parallel computing
Vector models for data-parallel computing
Algorithmic skeletons: structured management of parallel computation
Algorithmic skeletons: structured management of parallel computation
Implementation of a portable nested data-parallel language
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiling nested data-parallel programs for shared-memory multiprocessors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Design patterns: elements of reusable object-oriented software
Design patterns: elements of reusable object-oriented software
Programming parallel algorithms
Communications of the ACM
Optimizing ML with run-time code generation
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Models and languages for parallel computation
ACM Computing Surveys (CSUR)
Generative programming: methods, tools, and applications
Generative programming: methods, tools, and applications
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Nepal - Nested Data Parallelism in Haskell
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
From patterns to frameworks to parallel programs
Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
Using generative design patterns to generate parallel code for a distributed memory environment
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Pattern-Based Parallel Programming
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Code Generation in Action
ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh
Skeleton-based parallel programming: Functional and parallel semantics in a single shot
Computer Languages, Systems and Structures
GPU Computing: Programming a Massively Parallel Processor
Proceedings of the International Symposium on Code Generation and Optimization
Scalarization on Short Vector Machines
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Patterns for parallel programming
Patterns for parallel programming
Structured parallel programming with deterministic patterns
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Vapor SIMD: Auto-vectorize once, run everywhere
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Decoupling algorithms from schedules for easy optimization of image processing pipelines
ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Vapor SIMD: Auto-vectorize once, run everywhere
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Compiling a high-level language for GPUs: (via language support for architectures and compilers)
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Improving performance of OpenCL on CPUs
CC'12 Proceedings of the 21st international conference on Compiler Construction
Towards high-performance implementations of a custom HPC kernel using ® array building blocks
Facing the Multicore-Challenge II
Parallel programming in Haskell almost for free: an embedding of intel's array building blocks
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Riposte: a trace-driven compiler and parallel VM for vector code in R
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Terra: a multi-stage language for high-performance computing
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
International Journal of High Performance Computing Applications
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Embrace, defend, extend: a methodology for embedding preexisting DSLs
Proceedings of the 1st annual workshop on Functional programming concepts in domain-specific languages
River trail: a path to parallelism in JavaScript
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer's ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world's software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient core and vector parallelism is essential to enabling the use of multi- and many-core processor architectures. However, hardware architectures and vector ISAs are also shifting and diversifying quickly, making it difficult for a single binary to run well on all possible targets. Because of this, retargetability and dynamic compilation are of growing relevance. This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework. This system focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++. ArBB interoperates with other programming models to help meet the demands we hear from customers for a solution with both greater programmer productivity and good performance. This work makes contributions in language features, compiler architecture, code transformations and optimizations. It presents performance data from the current beta release of ArBB and quantitatively shows the impact of some key analyses, enabling transformations and optimizations for a variety of benchmarks that are of interest to our customers.