Simulating lattice QCD on a CALTEC/JPL Hypercube
International Journal of Supercomputer Applications and High Performance Engineering
Parallel ocean general circulation modeling
Proceedings of the eleventh annual international conference of the Center for Nonlinear Studies on Experimental mathematics : computational issues in nonlinear science: computational issues in nonlinear science
Computational design of the NCAR community climate model
Parallel Computing - Special issue: climate and weather modeling
Design and performance of a scalable parallel community climate model
Parallel Computing - Special issue: climate and weather modeling
Data organization and I/O in a parallel ocean circulation model
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Using Message-Driven Objects to Mask Latency in Grid Computing Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Simulation-based performance prediction for large parallel machines
International Journal of Parallel Programming - Special issue: The next generation software program
A block-based parallel adaptive scheme for solving the 4D Vlasov equation
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Efficient simulation of agent-based models on multi-GPU and multi-core clusters
Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques
Shallow water simulations on multiple GPUs
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
A preview and exploratory technique for large-scale scientific simulations
EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
Hi-index | 0.00 |
In solving Partial Differential Equations, such as the Barotropic equations in ocean models, on Distributed Memory Computers, finite difference methods are commonly used. Most often, processor subdomain boundaries must be updated at each time step. This boundary update process involves many messages of small sizes, therefore large communication overhead. Here we propose a new approach which expands the ghost cell layers and thus updates boundaries much less frequently --- reducing total message volume and groupping small messages into bigger ones. Together with a technique for eliminating diagonal communications, the method speedup communication substantially, upto 170%. We explain the method and implementation in details, provide systematic timing results and performance analysis on the Cray T3E and IBM SP.