Implementation and performance issues of a massively parallel atmospheric model
Parallel Computing - Special issue: climate and weather modeling
The Accumulation of Rounding Errors and Port Validation for Global Atmospheric Models
SIAM Journal on Scientific Computing
SKaMPI: A Detailed, Accurate MPI Benchmark
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
On Benchmarking Collective MPI Operations
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The VPC Trace-Compression Algorithms
IEEE Transactions on Computers
Using compression to improve chip multiprocessor performance
Using compression to improve chip multiprocessor performance
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
In this paper, we study the scalability of an atmospheric modeling application on a cluster with commercially available off-the-shelf interconnects. It is found that interconnects with large latency and low bandwidth are major bottlenecks for performance scalability. Response curves for latency shows that for large message sizes latency is extremely sensitive to the size of the message. Thus, decreasing the message size could reduce the latency and hence improve the scalability. We propose both lossless and lossy (i.e., with loss of some information) compression schemes to reduce message sizes. These compression techniques are investigated for the Community Atmospheric Model (CAM), which is a large scale parallel application used for global climate simulation, on a IBM Power 5 Cluster with Gigabit interconnect. This is a floating point intensive application which involves both point-to-point and collective all-to-all communication of large messages ( 128KB). Floating point data which constitute the messages in CAM application results in 14.8% compression when lossless compression is employed and the speedup improves by about 18% on 32 processors. We further evaluate three lossy compression schemes with very low overheads (0.15%). We study the acceptability criteria for information loss in the lossy compression schemes using a perturbation growth test procedure. The lossy compression schemes achieve a message size reduction of 66.2% and an execution time speedup of up to 20.78 on 32 processors. We also look at the criteria for acceptability of loss of information in lossy compression techniques.