Improving the computational intensity of unstructured mesh applications

Authors:
Brian S. White;Sally A. McKee;Bronis R. de Supinski;Brian Miller;Daniel Quinlan;Martin Schulz
Affiliations:
Cornell University;Cornell University;Lawrence Livermore National Laboratory;Lawrence Livermore National Laboratory;Lawrence Livermore National Laboratory;Lawrence Livermore National Laboratory
Venue:
Proceedings of the 19th annual international conference on Supercomputing
Year:
2005

Citing 18
Cited 11

Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
High-level semantic optimization of numerical codes

ICS '99 Proceedings of the 13th international conference on Supercomputing
Achieving high sustained performance in an unstructured mesh CFD application

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
An annotation language for optimizing software libraries

Proceedings of the 2nd conference on Domain-specific languages
Automatic loop transformations and parallelization for Java

Proceedings of the 14th international conference on Supercomputing
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Proceedings of the 14th international conference on Supercomputing
Performance modeling and tuning of an unstructured mesh CFD application

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Hybrid analysis: static & dynamic memory reference analysis

ICS '02 Proceedings of the 16th international conference on Supercomputing
Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library

ICS '02 Proceedings of the 16th international conference on Supercomputing
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
An empirical performance evaluation of scalable scientific applications

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Compile-time composition of run-time data and iteration reorderings

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Identifying and Exploiting Spatial Regularity in Data Memory References

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Parallel object-oriented framework optimization: Research Articles

Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Incorporating application semantics and control into compilation

DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Applying loop optimizations to object-oriented abstractions through general classification of array semantics

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

A hierarchical model of data locality

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Low-constant parallel algorithms for finite element simulations using linear octrees

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A projection-based optimization framework for abstractions with application to the unstructured mesh domain

Proceedings of the 22nd annual international conference on Supercomputing
Dendro: parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A component model of spatial locality

Proceedings of the 2009 international symposium on Memory management
CLOMP: accurately characterizing OpenMP application overheads

International Journal of Parallel Programming
CLOMP: accurately characterizing OpenMP application overheads

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Annotating user-defined abstractions for optimization

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Parallel Geometric Multigrid Method for Finite Elements on Octree Meshes

SIAM Journal on Scientific Computing
Heterogeneous combinatorial candidate generation

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Computer performance analysis and the Pi Theorem

Computer Science - Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although unstructured mesh algorithms are a popular means of solving problems across a broad range of disciplines---from texture mapping to computational fluid dynamics---they are often dominated not by computation, but by mesh overhead. Our study of an object-oriented mesh-based benchmark reveals that 72% of its execution time is spent on mesh-related operations, such as iterating over faces or chasing pointers. We report a series of optimizations---some traditional, some novel---that dramatically improve the benchmark's computational intensity---the ratio of floating point operations to memory accesses. This improvement is attributable to an eight-fold reduction in memory operations and results in a 4.7x speedup in execution time.Our work demonstrates that common subexpression elimination and code motion are important optimizations for mesh-based codes. However, conservative analysis prevents their application. We discuss these barriers to analysis and argue that an understanding of mesh semantics complements more traditional analyses, such as pointer alias analysis, and certifies the correctness of these optimizations. Our identification of overheads in mesh-based codes, optimizations that address them, and limitations of current compiler analyses are required for our eventual goal of automating these optimizations in a semantics-aware compiler.