The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Micro benchmark analysis of the KSR1
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Quantifying Locality Effect in Data Access Delay: Memory logP
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Hi-index | 0.00 |
Communication in parallel applications is a combination of data transfers internally at a source or destination and across the network. Previous research focused on quantifying network transfer costs has indirectly resulted in reduced overall communication cost. Optimized data transfer from source memory to the network interface has received less attention. In shared memory systems, such memory-to-memory transfers dominate communication cost. In distributed memory systems, memory-to-network interface transfers grow in significance as processor and network speeds increase at faster rates than memory latency speeds. Our objective is to minimize the cost of internal data transfers. The following examples illustrating the impact of memory transfers on communication, we present a methodology for classifying the effects of data size and data distribution on hardware, middleware, and application software performance. This cost is quantified using hardware counter event measurements on the SGI Origin 2000. Our analysis technique identifies the critical data paths in point-to-point communication. For the SGI O2K, we empirically identify the cost caused by just copying data from one buffer to another and the middleware overhead. We use MPICH in our experiments, but our techniques are generally applicable to any communication implementation.