Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite

Authors:
Jorge González-Domínguez;Guillermo L. Taboada;Basilio B. Fraguela;María J. Martín;Juan Touriño
Affiliations:
Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain;Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain;Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain;Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain;Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain
Venue:
Computers and Electrical Engineering
Year:
2012

Citing 13
Cited 3

Optimization of MPI collectives on clusters of large-scale SMP's

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters

Proceedings of the 20th annual international conference on Supercomputing
Performance modeling and analysis of heterogeneous meta-computing systems interconnection networks

Computers and Electrical Engineering
MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations

Proceedings of the 23rd international conference on Supercomputing
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Process Mapping for MPI Collective Communications

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Variable-size mosaics: A process-variation aware technique to increase the performance of tile-based, massive multi-core processors

Computers and Electrical Engineering

UPCBLAS: a library for parallel matrix computations in Unified Parallel C

Concurrency and Computation: Practice & Experience
Static and dynamic job scheduling with communication aware policy in cluster computing

Computers and Electrical Engineering
The Servet 3.0 benchmark suite: Characterization of network performance degradation

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Servet is a suite of benchmarks focused on detecting a set of parameters with high influence on the overall performance of multicore systems. These parameters can be used for autotuning codes to increase their performance on multicore clusters. Although Servet has been proved to detect accurately cache hierarchies, bandwidths and bottlenecks in memory accesses, as well as the communication overhead among cores, up to now the impact of the use of this information on application performance optimization has not been assessed. This paper presents a novel algorithm that automatically uses Servet for mapping parallel applications on multicore systems and analyzes its impact on three testbeds using three different parallel programming models: message-passing, shared memory and partitioned global address space (PGAS). Our results show that a suitable mapping policy based on the data provided by this tool can significantly improve the performance of parallel applications without source code modification.