Building domain-specific embedded languages
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Space-limited procedures: a methodology for portable high-performance
PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
The java hotspotTM server compiler
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Fast Parallel Expectation Maximization for Gaussian Mixture Models on GPUs Using CUDA
HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
Auto-tuning performance on multicore computers
Auto-tuning performance on multicore computers
A domain-specific approach to heterogeneous parallelism
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Portable parallel performance from sequential, productive, embedded domain-specific languages
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Scalable multimedia content analysis on parallel platforms using python
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Hi-index | 0.00 |
Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware depending on the input data size and the hardware parameters. We show how to preserve the productivity of high-level languages while obtaining the performance of the best low-level language code variant for a given hardware platform and problem size using SEJITS, a set of techniques that leverages just-in-time code generation and compilation. As a case study, we demonstrate our technique for Gaussian Mixture Model training using the EM algorithm. With the addition of one line of code to import our framework, a domain programmer using an existing Python GMM library can run her program unmodified on a GPU-equipped computer and achieve performance that meets or beats GPU code hand-crafted by a human expert. We also show that despite the overhead of allowing the domain expert's program to use Python and the overhead of just-in-time code generation and compilation, our approach still results in performance competitive with hand-crafted GPU code.