A survey of graph layout problems
ACM Computing Surveys (CSUR)
Performance by Design: Computer Capacity Planning By Example
Performance by Design: Computer Capacity Planning By Example
Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Proceedings of the 20th annual international conference on Supercomputing
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Mapping Algorithms for Multiprocessor Tasks on Multi-Core Clusters
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Topology-aware task mapping for reducing communication contention on large parallel machines
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Impact of Inter-application Contention in Current and Future HPC Systems
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Hi-index | 0.00 |
High performance clusters, which are established by connecting many computing nodes together, are known as one of main architectures to obtain extremely high performance. Currently, these systems are moving from multi-core architectures to many-core architectures to enhance their computational capabilities. This trend would eventually cause network interfaces to be a performance bottleneck because these interfaces are few in number and cannot handle multiple network requests at a time. The consequence of such issue would be higher waiting time at the network interface queue and lower performance. In this paper, we tackle this problem by introducing a process mapping algorithm, which attempts to improve inter-node communications in multi-core clusters. Our mapping strategy reduces accesses to the network interface by distributing communication-intensive processes among computing nodes, which leads to lower waiting time at the network interface queue. Performance results for synthetic and real workloads reveal that the proposed strategy improves the performance from 8 % up to 90 % in tested cases compared to other methods.