Modeling of parallel software for efficient computation communication overlap
ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
On the utility of communication-computation overlap in data-parallel programs
Journal of Parallel and Distributed Computing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
COMB: A Portable Benchmark Suite for Assessing MPI Overlap
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Identifying the Capability of Overlapping Computation with Communication
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Practical performance portability in the Parallel Ocean Program (POP): Research Articles
Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
International Journal of High Performance Computing Applications
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Performance Model of the Parallel Ocean Program
International Journal of High Performance Computing Applications
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Tolerating message latency through the early release of blocked receives
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO
ACM SIGOPS Operating Systems Review
MPI-aware compiler optimizations for improving communication-computation overlap
Proceedings of the 23rd international conference on Supercomputing
Subdomain communication to increase scalability in large-scale scientific applications
Proceedings of the 23rd international conference on Supercomputing
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application
Proceedings of the 24th ACM International Conference on Supercomputing
Light-weight communications on Intel's single-chip cloud computer processor
ACM SIGOPS Operating Systems Review
Making time-stepped applications tick in the cloud
Proceedings of the 2nd ACM Symposium on Cloud Computing
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Delta Send-Recv for Dynamic Pipelining in MPI Programs
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Improving reactivity and communication overlap in MPI using a generic I/O manager
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Parallel job scheduling for power constrained HPC systems
Parallel Computing
MPI and compiler technology: a love-hate relationship
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platforms
Microprocessors & Microsystems
Hi-index | 0.00 |
The design and implementation of a high performance communication network are critical factors in determining the performance and cost-effectiveness of a largescale computing system. The major issues center on the trade-off between the network cost and the impact of latency and bandwidth on application performance. One promising technique for extracting maximum application performance given limited network resources is based on overlapping computation with communication, which partially or entirely hides communication delays. While this approach is not new, there are few studies that quantify the potential benefit of such overlapping for large-scale production scientific codes. We address this with an empirical method combined with a network model to quantify the potential overlap in several codes and examine the possible performance benefit. Our results demonstrate, for the codes examined, that a high potential tolerance to network latency and bandwidth exists because of a high degree of potential overlap. Moreover, our results indicate that there is often no need to use finegrained communication mechanisms to achieve this benefit, since the major source of potential overlap is found in independent work--computation on which pending messages does not depend. This allows for a potentially significant relaxation of network requirements without a consequent degradation of application performance.