ACM Computing Surveys (CSUR)
Achieving high sustained performance in an unstructured mesh CFD application
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance modeling and tuning of an unstructured mesh CFD application
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Special Bilinear Quadrilateral Elements For Locally Refined Finite Element Grids
SIAM Journal on Scientific Computing
A Parallel Version of the Fast Multipole Method-Invited Talk
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Gerris: a tree-based adaptive solver for the incompressible Euler equations in complex geometries
Journal of Computational Physics
A kernel-independent adaptive fast multipole algorithm in two and three dimensions
Journal of Computational Physics
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A New Parallel Kernel-Independent Fast Multipole Method
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
High Resolution Forward And Inverse Earthquake Modeling on Terascale Computers
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Improving the computational intensity of unstructured mesh applications
Proceedings of the 19th annual international conference on Supercomputing
Scalable Parallel Octree Meshing for TeraScale Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Is 1.7 x 10^10 Unknowns the Largest Finite Element System that Can Be Solved Today?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High Resolution Aerospace Applications using the NASA Columbia Supercomputer
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
From mesh generation to scientific visualization: an end-to-end approach to parallel supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dendro: parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Biomechanically-Constrained 4D Estimation of Myocardial Motion
MICCAI '09 Proceedings of the 12th International Conference on Medical Image Computing and Computer-Assisted Intervention: Part II
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A Parallel Geometric Multigrid Method for Finite Elements on Octree Meshes
SIAM Journal on Scientific Computing
p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees
SIAM Journal on Scientific Computing
Poster: parallel octree-based meshing for finite element computations
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Hi-index | 0.00 |
In this article we propose parallel algorithms for the construction of conforming finite-element discretization on linear octrees. Existing octree-based discretizations scale to billions of elements, but the complexity constants can be high. In our approach we use several techniques to minimize overhead: a novel bottom-up tree-construction and 2:1 balance constraint enforcement; a Golomb-Rice encoding for compression by representing the octree and element connectivity as an Uniquely Decodable Code (UDC); overlapping communication and computation; and byte alignment for cache efficiency. The cost of applying the Laplacian is comparable to that of applying it using a direct indexing regular grid discretization with the same number of elements. Our algorithm has scaled up to four billion octants on 4096 processors on a Cray XT3 at the Pittsburgh Supercomputing Center. The overall tree construction time is under a minute in contrast to previous implementations that required several minutes; the evaluation of the discretization of a variable-coefficient Laplacian takes only a few seconds.