LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
LogGP: incorporating long messages into the LogP model for parallel computation
Journal of Parallel and Distributed Computing
X-Ray: A Tool for Automatic Measurement of Hardware Parameters
QEST '05 Proceedings of the Second International Conference on the Quantitative Evaluation of Systems
Proceedings of the 20th annual international conference on Supercomputing
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Improving communication performance in dense linear algebra via topology aware collectives
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Automatic Computer System Characterization for a Parallelizing Compiler
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Computers and Electrical Engineering
BlackjackBench: portable hardware characterization
ACM SIGMETRICS Performance Evaluation Review
UPCBLAS: a library for parallel matrix computations in Unified Parallel C
Concurrency and Computation: Practice & Experience
Mapping applications with collectives over sub-communicators on torus networks
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimization principles for collective neighborhood communications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication avoiding and overlapping for numerical linear algebra
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Servet is a suite of benchmarks focused on extracting a set of parameters with high influence on the overall performance of multicore clusters. These parameters can be used to optimize the performance of parallel applications by adapting part of their behavior to the characteristics of the machine. Up to now the tool considered network bandwidth as constant and independent of the communication pattern. Nevertheless, the inter-node communication bandwidth decreases on modern large supercomputers depending on the number of cores per node that simultaneously access the network and on the distance between the communicating nodes. This paper describes two new benchmarks that improve Servet by characterizing the network performance degradation depending on these factors. This work also shows the experimental results of these benchmarks on a Cray XE6 supercomputer and some examples of how real parallel codes can be optimized by using the information about network degradation.