Performance Portability on EARTH: A Case Study across Several Parallel Architectures
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Hi-index | 0.00 |
Conjugate gradient (CG) is one of the most popular iterative approaches to solving large sparse linear systems of equations. This work reports a parallel implementation of CG on clusters with EARTH multithreaded runtime support. Interphase and intraphase communication costs are balanced using a two-dimensional blocking method, minimizing overall communication costs. EARTH'S adaptive, event-driven multithreaded execution model gives additional opportunities to overlap communication and computation to achieve even better scalability. Experiments on a large Beowulf cluster with gigabit Ethernet show notable improvements over other parallel CG implementations. For example, with the NAS CG benchmark problem size Class C, our implementation achieved a speedup of 41 on a 64-node cluster, compared to 13 for the MPl-based NAS version. The results demonstrate that the combination of the two-dimensional blocking method and the EARTH architectural runtime support helps to compensate for the low communications bandwidth common to most clusters.