Quantification of memory communication

Authors:
Surendra Byna;Kirk W. Cameron;Xian-He Sun
Affiliations:
Department of Compuer Science, Illinois Institute of Technology, Chicago, IL;Department of Compuer Science, Illinois Institute of Technology, Chicago, IL and Department of Computer Science and Engineering, University of South Carolina, Columbia, SC;Department of Compuer Science, Illinois Institute of Technology, Chicago, IL
Venue:
High performance scientific and engineering computing
Year:
2004

Citing 9
Cited 0

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Micro benchmark analysis of the KSR1

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
A high-performance MPI implementation on a shared-memory vector supercomputer

Parallel Computing
LoGPC: Modeling Network Contention in Message-Passing Programs

IEEE Transactions on Parallel and Distributed Systems
Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Quantifying Locality Effect in Data Access Delay: Memory logP

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Communication in parallel applications is a combination of data transfers internally at a source or destination and across the network. Previous research focused on quantifying network transfer costs has indirectly resulted in reduced overall communication cost. Optimized data transfer from source memory to the network interface has received less attention. In shared memory systems, such memory-to-memory transfers dominate communication cost. In distributed memory systems, memory-to-network interface transfers grow in significance as processor and network speeds increase at faster rates than memory latency speeds. Our objective is to minimize the cost of internal data transfers. The following examples illustrating the impact of memory transfers on communication, we present a methodology for classifying the effects of data size and data distribution on hardware, middleware, and application software performance. This cost is quantified using hardware counter event measurements on the SGI Origin 2000. Our analysis technique identifies the critical data paths in point-to-point communication. For the SGI O2K, we empirically identify the cost caused by just copying data from one buffer to another and the middleware overhead. We use MPICH in our experiments, but our techniques are generally applicable to any communication implementation.