Design and performance of a scalable parallel community climate model
Parallel Computing - Special issue: climate and weather modeling
An Experimental Evaluation of I/O Optimizations on Different Applications
IEEE Transactions on Parallel and Distributed Systems
Integrating collective I/O and cooperative caching into the "clusterfile" parallel file system
Proceedings of the 18th annual international conference on Supercomputing
Performance and modularity benefits of message-driven execution
Journal of Parallel and Distributed Computing
Towards Ultra-High Resolution Models of Climate and Weather
International Journal of High Performance Computing Applications
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model
SBAC-PAD '10 Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing
Hi-index | 0.00 |
This work investigates the parallel scalability of BRAMS, a limited area weather forecasting production code, from O(100) cores to O(1,000) cores on large grids (20 km and 10 km resolution runs over South America). Initial experiments show lack of scalability at modest core count. Execution time profiling and source code examination revealed the causes of the limited scalability: sequential algorithms and extensive memory requirements at scarcely used phases of the computation. As processor count increases, these 'secondary' phases dominate execution time. Algorithm replacement and memory reduction generate a new code version that possesses strong and weak scaling. The new version achieved a speed-up of 6 from 100 to 700 processors on the 20 km resolution grid and a speed-up of 6.9 on the same processor range on the 10 km resolution grid. Results were confirmed at another machine with a distinct architecture. Further experiments show that the scalability of the 20 km resolution case is limited by load unbalancing at the most demanding computational phase.