A framework for generalized control dependence
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Supporting Timing Analysis by Automatic Bounding of LoopIterations
Real-Time Systems - Special issue on worst-case execution-time analysis
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
ICSE '81 Proceedings of the 5th international conference on Software engineering
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Guaranteed Loop Bound Identification from Program Traces for WCET
RTAS '09 Proceedings of the 2009 15th IEEE Symposium on Real-Time and Embedded Technology and Applications
Performance analysis of the OP2 framework on many-core architectures
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Liszt: a domain specific language for building portable mesh-based PDE solvers
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A distributed data-parallel framework for analysis and visualization algorithm development
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Design and performance of the OP2 library for unstructured mesh applications
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Hi-index | 0.00 |
Applications based on unstructured meshes are typically compute intensive, leading to long running times. In principle, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel programming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel implementation of each OP2-loop and discover opportunities for optimisation. In this paper, we describe how several compiler techniques can be effectively utilised in tandem to increase the performance of unstructured mesh applications. In particular, we show how whole-program analysis --- which is often inhibited due to the size of the control flow graph - often becomes feasible as a result of the OP2 programming model, facilitating aggressive optimisation. We subsequently show how whole-program analysis then becomes an enabler to OP2-loop optimisations. Based on this, we show how a classical technique, namely loop fusion, which is typically difficult to apply to unstructured mesh applications, can be defined at compile-time. We examine the limits of its application and show experimental results on a computational fluid dynamic application benchmark, assessing the performance gains due to loop fusion.