Optimization of MPI collectives on clusters of large-scale SMP's
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
Proceedings of the 20th annual international conference on Supercomputing
Performance modeling and analysis of heterogeneous meta-computing systems interconnection networks
Computers and Electrical Engineering
MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations
Proceedings of the 23rd international conference on Supercomputing
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Process Mapping for MPI Collective Communications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Computers and Electrical Engineering
UPCBLAS: a library for parallel matrix computations in Unified Parallel C
Concurrency and Computation: Practice & Experience
Static and dynamic job scheduling with communication aware policy in cluster computing
Computers and Electrical Engineering
The Servet 3.0 benchmark suite: Characterization of network performance degradation
Computers and Electrical Engineering
Hi-index | 0.00 |
Servet is a suite of benchmarks focused on detecting a set of parameters with high influence on the overall performance of multicore systems. These parameters can be used for autotuning codes to increase their performance on multicore clusters. Although Servet has been proved to detect accurately cache hierarchies, bandwidths and bottlenecks in memory accesses, as well as the communication overhead among cores, up to now the impact of the use of this information on application performance optimization has not been assessed. This paper presents a novel algorithm that automatically uses Servet for mapping parallel applications on multicore systems and analyzes its impact on three testbeds using three different parallel programming models: message-passing, shared memory and partitioned global address space (PGAS). Our results show that a suitable mapping policy based on the data provided by this tool can significantly improve the performance of parallel applications without source code modification.