Portable parallel performance from sequential, productive, embedded domain-specific languages

Authors:
Shoaib Kamil;Derrick Coetzee;Scott Beamer;Henry Cook;Ekaterina Gonina;Jonathan Harper;Jeffrey Morlan;Armando Fox
Affiliations:
University of California, Berkeley, CA, USA;University of California, Berkeley, CA, USA;University of California, Berkeley, CA, USA;University of California, Berkeley, CA, USA;University of California, Berkeley, CA, USA;Mississippi State University, Mississippi, MS, USA;University of California, Berkeley, CA, USA;University of California, Berkeley, CA, USA
Venue:
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Year:
2012

Citing 5
Cited 1

Building domain-specific embedded languages

ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
Minimizing communication in sparse matrix solvers

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
CUDA-level performance with python-level productivity for Gaussian mixture model applications

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism

LINQits: big data on little clients

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Domain-expert productivity programmers desire scalable application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such programmers are rare, to maximize reuse of their work we propose encapsulating their strategies in mini-compilers for domain-specific embedded languages (DSELs) glued together by a common high-level host language familiar to productivity programmers. The nontrivial applications that use these DSELs perform up to 98% of peak attainable performance, and comparable to or better than existing hand-coded implementations. Our approach is unique in that each mini-compiler not only performs conventional compiler transformations and optimizations, but includes imperative procedural code that captures an efficiency expert's strategy for mapping a narrow domain onto a specific type of hardware. The result is source- and performance-portability for productivity programmers and parallel performance that rivals that of hand-coded efficiency-language implementations of the same applications. We describe a framework that supports our methodology and five implemented DSELs supporting common computation kernels. Our results demonstrate that for several interesting classes of problems, efficiency-level parallel performance can be achieved by packaging efficiency programmers' expertise in a reusable framework that is easy to use for both productivity programmers and efficiency programmers.