Bottom-Up Construction and 2:1 Balance Refinement of Linear Octrees in Parallel

Authors:
Hari Sundar;Rahul S. Sampath;George Biros
Affiliations:
hsundar@seas.upenn.edu;rahulss@seas.upenn.edu;biros@seas.upenn.edu
Venue:
SIAM Journal on Scientific Computing
Year:
2008

Citing 0
Cited 20

Dendro: parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scalable adaptive mantle convection simulation on petascale supercomputers

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A massively parallel adaptive fast-multipole method on heterogeneous architectures

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Biomechanically-Constrained 4D Estimation of Myocardial Motion

MICCAI '09 Proceedings of the 12th International Conference on Medical Image Computing and Computer-Assisted Intervention: Part II
Parallel Fast Gauss Transform

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Fast high-dimensional approximation with sparse occupancy trees

Journal of Computational and Applied Mathematics
A Parallel Geometric Multigrid Method for Finite Elements on Octree Meshes

SIAM Journal on Scientific Computing
On well-separated sets and fast multipole methods

Applied Numerical Mathematics
Algorithms and data structures for massively parallel generic adaptive finite element codes

ACM Transactions on Mathematical Software (TOMS)
Efficiency Based Adaptive Local Refinement for First-Order System Least-Squares Formulations

SIAM Journal on Scientific Computing
p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees

SIAM Journal on Scientific Computing
Peano—A Traversal and Storage Scheme for Octree-Like Adaptive Cartesian Multiscale Grids

SIAM Journal on Scientific Computing
A Second Order Discretization of Maxwell's Equations in the Quasi-Static Regime on OcTree Grids

SIAM Journal on Scientific Computing
A massively parallel adaptive fast multipole method on heterogeneous architectures

Communications of the ACM
Recovering geometric detail by octree normal maps

Transactions on Edutainment VII
Parallel geometric-algebraic multigrid on unstructured forests of octrees

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive fast multipole methods on the GPU

The Journal of Supercomputing
PRACE DECI (distributed european computing initiative) minisymposium

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
HykSort: a new variant of hypercube quicksort on distributed memory architectures

Proceedings of the 27th international ACM conference on International conference on supercomputing
Algorithms for high-throughput disk-to-disk sorting

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees are used in many problems in computational science and engineering, e.g., object representation, image analysis, unstructured meshing, finite elements, adaptive mesh refinement, and N-body simulations. Fixed-size scalability and isogranular analysis of the algorithms using an MPI-based parallel implementation was performed on a variety of input data and demonstrated good scalability for different processor counts (1 to 1024 processors) on the Pittsburgh Supercomputing Center's TCS-1 AlphaServer. The results are consistent for different data distributions. Octrees with over a billion octants were constructed and balanced in less than a minute on 1024 processors. Like other existing algorithms for constructing and balancing octrees, our algorithms have $\mathcal{O}(N\log N)$ work and $\mathcal{O}(N)$ storage complexity. Under reasonable assumptions on the distribution of octants and the work per octant, the parallel time complexity is $\mathcal{O}(\frac{N}{n_p}\log(\frac{N}{n_p})+n_p\log n_p)$, where $N$ is the size of the final linear octree and $n_p$ is the number of processors.