Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Programming parallel algorithms
Communications of the ACM
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Building domain-specific embedded languages
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
An annotation language for optimizing software libraries
Proceedings of the 2nd conference on Domain-specific languages
A case for source-level transformations in MATLAB
Proceedings of the 2nd conference on Domain-specific languages
Domain-specific languages: an annotated bibliography
ACM SIGPLAN Notices
IEEE Transactions on Parallel and Distributed Systems
Parallel and Distributed Haskells
Journal of Functional Programming
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
When and how to develop domain-specific languages
ACM Computing Surveys (CSUR)
LINQ: reconciling object, relations and XML in the .NET framework
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Performance evaluation of GPUs using the RapidMind development platform
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
KHEPERA: a system for rapid implementation of domain specific languages
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Distributed data-parallel computing using a high-level programming language
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
CUDA-level performance with python-level productivity for Gaussian mixture model applications
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
PARRAY: a unifying array representation for heterogeneous parallelism
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
OpenMP-style parallelism in data-centered multicore computing with R
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Green-Marl: a DSL for easy and efficient graph analysis
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
A distributed data-parallel framework for analysis and visualization algorithm development
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Exploiting parallelism in deterministic shared memory multiprocessing
Journal of Parallel and Distributed Computing
Parakeet: a just-in-time parallel accelerator for python
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Evaluating scala, actors, & ontologies for intelligent realtime interactive systems
Proceedings of the 18th ACM symposium on Virtual reality software and technology
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
KFusion: optimizing data flow without compromising modularity
Proceedings of the 12th annual international conference on Aspect-oriented software development
Terra: a multi-stage language for high-performance computing
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Modelling and Simulation in Engineering
Scalable multimedia content analysis on parallel platforms using python
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Forge: generating a high performance DSL implementation from a declarative specification
Proceedings of the 12th international conference on Generative programming: concepts & experiences
Composition and reuse with compiled domain-specific languages
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Algebraic program semantics for supercomputing
Theories of Programming and Formal Methods
Singe: leveraging warp specialization for high performance on GPUs
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Framework for Multiplatform HPC Applications
Proceedings of Programming Models and Applications on Multicores and Manycores
Leveraging GPUs using cooperative loop speculation
ACM Transactions on Architecture and Code Optimization (TACO)
Parallel Computing
Hi-index | 0.01 |
Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be accessible to the average programmer. We propose leveraging domain-specific languages (DSLs) to map high-level application code to heterogeneous devices. To demonstrate the potential of this approach we present OptiML, a DSL for machine learning. OptiML programs are implicitly parallel and can achieve high performance on heterogeneous hardware with no modification required to the source code. For such a DSL-based approach to be tractable at large scales, better tools are required for DSL authors to simplify language creation and parallelization. To address this concern, we introduce Delite, a system designed specifically for DSLs that is both a framework for creating an implicitly parallel DSL as well as a dynamic runtime providing automated targeting to heterogeneous parallel hardware. We show that OptiML running on Delite achieves single-threaded, parallel, and GPU performance superior to explicitly parallelized MATLAB code in nearly all cases.