ICS '88 Proceedings of the 2nd international conference on Supercomputing
A practical algorithm for exact array dependence analysis
Communications of the ACM
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Monad transformers and modular interpreters
POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
Parameterized polyhedra and their vertices
International Journal of Parallel Programming
IEEE Transactions on Parallel and Distributed Systems
Techniques for the translation of MATLAB programs into Fortran 90
ACM Transactions on Programming Languages and Systems (TOPLAS)
C and tcc: a language and compiler for dynamic code generation
ACM Transactions on Programming Languages and Systems (TOPLAS)
PEPM '00 Proceedings of the 2000 ACM SIGPLAN workshop on Partial evaluation and semantics-based program manipulation
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Domain-specific languages: an annotated bibliography
ACM SIGPLAN Notices
ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Scheduling and Automatic Parallelization
Scheduling and Automatic Parallelization
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Adaptive Optimizing Compilers for the 21st Century
The Journal of Supercomputing
POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Implementing multi-stage languages using ASTs, Gensym, and reflection
Proceedings of the 2nd international conference on Generative programming and component engineering
Parallelization of divide-and-conquer by translation to nested loops
Journal of Functional Programming
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A methodology for generating verified combinatorial circuits
Proceedings of the 4th ACM international conference on Embedded software
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
Multi-stage programming with functors and monads: eliminating abstraction overhead from generic code
GPCE'05 Proceedings of the 4th international conference on Generative Programming and Component Engineering
Implicitly heterogeneous multi-stage programming
GPCE'05 Proceedings of the 4th international conference on Generative Programming and Component Engineering
Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Shifting the stage: staging with delimited control
Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Transactions on High-Performance Embedded Architectures and Compilers I
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
GPCE '10 Proceedings of the ninth international conference on Generative programming and component engineering
Proceedings of the 20th ACM SIGPLAN workshop on Partial evaluation and program manipulation
Multi-stage programming with functors and monads: Eliminating abstraction overhead from generic code
Science of Computer Programming
Probabilistic auto-tuning for architectures with complex constraints
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Shifting the stage: Staging with delimited control
Journal of Functional Programming
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
Communications of the ACM
Reasoning about multi-stage programs
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Shonan challenge for generative programming: short position paper
PEPM '13 Proceedings of the ACM SIGPLAN 2013 workshop on Partial evaluation and program manipulation
Spiral in scala: towards the systematic construction of generators for performance libraries
Proceedings of the 12th international conference on Generative programming: concepts & experiences
Combinators for impure yet hygienic code generation
Proceedings of the ACM SIGPLAN 2014 Workshop on Partial Evaluation and Program Manipulation
Hi-index | 0.02 |
The quality of compiler-optimized code for high-performance applications is far behind what optimization and domain experts can achieve by hand. Although it may seem surprising at first glance, the performance gap has been widening over time, due to the tremendous complexity increase in microprocessor and memory architectures, and to the rising level of abstraction of popular programming languages and styles. This paper explores in-between solutions, neither fully automatic nor fully manual ways to adapt a computationally intensive application to the target architecture. By mimicking complex sequences of transformations useful to optimize real codes, we show that generative programming is a practical means to implement architecture-aware optimizations for high-performance applications.This work explores the promises of generative programming languages and techniques for the high-performance computing expert. We show that complex, architecture-specific optimizations can be implemented in a type-safe, purely generative framework. Peak performance is achievable through the careful combination of a high-level, multi-stage evaluation language - MetaOCaml - with low-level code generation techniques. Nevertheless, our results also show that generative approaches for high-performance computing do not come without technical caveats and implementation barriers concerning productivity and reuse. We describe these difficulties and identify ways to hide or overcome them, from abstract syntaxes to heterogeneous generators of code generators, combining high-level and type-safe multi-stage programming with a back-end generator of imperative code.