Composition and reuse with compiled domain-specific languages

  • Authors:
  • Arvind K. Sujeeth;Tiark Rompf;Kevin J. Brown;HyoukJoong Lee;Hassan Chafi;Victoria Popic;Michael Wu;Aleksandar Prokopec;Vojin Jovanovic;Martin Odersky;Kunle Olukotun

  • Affiliations:
  • Stanford University;École Polytechnique Fédérale de Lausanne (EPFL), Switzerland,Oracle Labs;Stanford University;Stanford University;Stanford University and Oracle Labs;Stanford University;Stanford University;École Polytechnique Fédérale de Lausanne (EPFL), Switzerland;École Polytechnique Fédérale de Lausanne (EPFL), Switzerland;École Polytechnique Fédérale de Lausanne (EPFL), Switzerland;Stanford University

  • Venue:
  • ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
  • Year:
  • 2013

Quantified Score

Hi-index 0.02

Visualization

Abstract

Programmers who need high performance currently rely on low-level, architecture-specific programming models (e.g. OpenMP for CMPs, CUDA for GPUs, MPI for clusters). Performance optimization with these frameworks usually requires expertise in the specific programming model and a deep understanding of the target architecture. Domain-specific languages (DSLs) are a promising alternative, allowing compilers to map problem-specific abstractions directly to low-level architecture-specific programming models. However, developing DSLs is difficult, and using multiple DSLs together in a single application is even harder because existing compiled solutions do not compose together. In this paper, we present four new performance-oriented DSLs developed with Delite, an extensible DSL compilation framework. We demonstrate new techniques to compose compiled DSLs embedded in a common backend together in a single program and show that generic optimizations can be applied across the different DSL sections. Our new DSLs are implemented with a small number of reusable components (less than 9 parallel operators total) and still achieve performance up to 125x better than library implementations and at worst within 30% of optimized stand-alone DSLs. The DSLs retain good performance when composed together, and applying cross-DSL optimizations results in up to an additional 1.82x improvement.