Quantification of memory communication
High performance scientific and engineering computing
Predicting and Evaluating Distributed Communication Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Techniques for pipelined broadcast on ethernet switched clusters
Journal of Parallel and Distributed Computing
Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
mPlogP: A Parallel Computation Model for Heterogeneous Multi-core Computer
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A performance model for fine-grain accesses in UPC
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
Towards a complexity model for design and analysis of PGAS-based algorithms
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Hi-index | 0.00 |
The application of hardware-parameterized models to distributed systems can result in omission of key bottlenecks such as the full cost of inter-node communication in a shared memory cluster. However, inclusion in the model of message characteristics and complex memory hierarchies may result in impractical models. Nonetheless, the growing gap betweenmemory and CPU performance combined with the trend toward large scale clustered shared memory platforms implies an increased need to consider the impact of local memory communication on parallel processing in distributed systems. We present a simple and useful model of point-to-`point memory communication to predict and analyze the latency of memory copy, pack and unpack. We use the model to isolate contributions of hardware, middleware, and software to data transfers on Intel- and MIPS-based platforms.