Parallel stereocorrelation on a reconfigurable multi-ring network
The Journal of Supercomputing - Special issue on parallel and distributed processing
Parallel Computer Vision on a Reconfigurable Multiprocessor Network
IEEE Transactions on Parallel and Distributed Systems
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Parallel algorithms for radiation transport on unstructured grids
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
International Journal of Parallel Programming
Towards a parallel framework of grid-based numerical algorithms on DAGs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Discontinuous finite element discrete ordinates (DFE-Sn) method is widely used to solve the time-dependent neutron transport equation for nuclear science and engineering applications. Most efficiently, the kernel is to iteratively sweep the neutron flux across the computational grid. However, for unstructured grid this will bring forward several challenges while implemented on distributed memory parallel computers where the grid are decomposed across processors. This paper presents a parallel flux sweep algorithm to improve the parallel scalability of this basic sweep algorithm on unstructured grid under 2-D cylindrical Lagrange coordinates system from two sides. One is to prioritize the sweep order of elements within each subdomain, another is to better decompose the unstructured grid across processors. With optimal combination of domain decomposition method and priority queuing algorithm, this parallel algorithm has successfully been incorporated into DFE-Sn method and has been implemented with MPI to solve the neutron and photon coupled transport equation for complex physics. Performance results for two different applications on hundreds of processors of two parallel computers are given in this paper. In particular, the parallel solver has respectively achieved speedup larger than 72 using 92 processors and 78 using 256 processors on these two machines.