StagedSAC: a case study in performance-oriented DSL development
PEPM '12 Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation
PEPM '12 Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation
PARRAY: a unifying array representation for heterogeneous parallelism
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Mapping a data-flow programming model onto heterogeneous platforms
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Diderot: a parallel DSL for image analysis and visualization
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Parakeet: a just-in-time parallel accelerator for python
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Patus for convenient high-performance stencils: evaluation in earthquake simulations
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Terra: a multi-stage language for high-performance computing
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
What are the Odds?: probabilistic programming in Scala
Proceedings of the 4th Workshop on Scala
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Skeletal based programming for dynamic programming on MultiGPU systems
The Journal of Supercomputing
Forge: generating a high performance DSL implementation from a declarative specification
Proceedings of the 12th international conference on Generative programming: concepts & experiences
Efficient high-level abstractions for web programming
Proceedings of the 12th international conference on Generative programming: concepts & experiences
Spiral in scala: towards the systematic construction of generators for performance libraries
Proceedings of the 12th international conference on Generative programming: concepts & experiences
INSPIRE: the insieme parallel intermediate representation
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Composition and reuse with compiled domain-specific languages
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Unifying functional and object-oriented programming with Scala
Communications of the ACM
Hi-index | 0.02 |
Computing systems are becoming increasingly parallel and heterogeneous, and therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance. However, targeting these emerging devices often requires using multiple disparate programming models and making decisions that can limit forward scalability. In previous work we proposed the use of domain-specific languages (DSLs) to provide high-level abstractions that enable transformations to high performance parallel code without degrading programmer productivity. In this paper we present a new end-to-end system for building, compiling, and executing DSL applications on parallel heterogeneous hardware, the Delite Compiler Framework and Runtime. The framework lifts embedded DSL applications to an intermediate representation (IR), performs generic, parallel, and domain-specific optimizations, and generates an execution graph that targets multiple heterogeneous hardware devices. Finally we present results comparing the performance of several machine learning applications written in OptiML, a DSL for machine learning that utilizes Delite, to C++ and MATLAB implementations. We find that the implicitly parallel OptiML applications achieve single-threaded performance comparable to C++ and outperform explicitly parallel MATLAB in nearly all cases.