Producing scalable performance with OpenMP: experiments with two CFD applications
Parallel Computing - Special issue on parallel computing in aerospace
Optimizing Finite Element Programs on the Cray X1 Using Coloring Schemes
DOD_UGC '04 Proceedings of the 2004 Users Group Conference
OpenMP on Multicore Architectures
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
A parallel levenberg-marquardt algorithm
Proceedings of the 23rd international conference on Supercomputing
International Journal of Parallel Programming
Parallel training of artificial neural networks using multithreaded and multicore CPUs
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part I
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Power/Performance trade-offs of small batched LU based solvers on GPUs
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF. Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg-Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100-200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.