A multilevel algorithm for partitioning graphs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Multilevel algorithms for multi-constraint graph partitioning
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel Multilevel Graph Partitioning
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Adjacency-based data reordering algorithm for acceleration of finite element computations
Scientific Programming
Parallel hypergraph partitioning for scientific computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Controlling Unstructured Mesh Partitions for Massively Parallel Simulations
SIAM Journal on Scientific Computing
High quality real-time image-to-mesh conversion for finite element simulations
Proceedings of the 27th international ACM conference on International conference on supercomputing
High quality real-time Image-to-Mesh conversion for finite element simulations
Journal of Parallel and Distributed Computing
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Parallel simulations at extreme scale require that the mesh is distributed across a large number of processors with equal work load and minimum inter-part communications. A number of algorithms have been developed to meet these goals and graph/hypergraph-based methods are by far the most powerful ones. However, the global implementation of current approaches can fail on very large core counts and the vertex imbalance is not optimal where individual cores are lightly loaded. Those issues are resolved by combination of global and local partitioning and an iterative improvement algorithm, LIIPBMod, developed in the previous study (Zhou et al. in SIAM J. Sci. Comput. 32:3201---3227, 2010). In the current work, this combined partition strategy is applied to the simulations at extreme scale with up to O(1010) elements and up to O(300K) cores. Strong scaling studies on IBM BlueGene/P and Cray XT5 systems demonstrate the effectiveness of this combined partition algorithm.