An efficient kernel-level blocking MPI implementation

Authors:
Atsushi Hori;Toyohisa Kameyama;Yuichi Tsujita;Mitaro Namiki;Yutaka Ishikawa
Affiliations:
RIKEN AICS, Kobe, Hyogo, Japan;RIKEN AICS, Kobe, Hyogo, Japan;Kinki Unversity, Higashi-Hiroshima, Hiroshima, Japan;Tokyo University of Agriculture and Technology, Fuchu, Tokyo, Japan;RIKEN AICS, Kobe, Hyogo, Japahe University of Tokyo, Bunkyo-ku, Tokyo, Japan
Venue:
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Year:
2012

Citing 8
Cited 0

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs

IEEE Parallel & Distributed Technology: Systems & Technology
PM: An Operating System Coordinated High Performance Communication Library

HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Reducing Waiting Costs in User-Level Communication

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The technique of user-level communication, where incoming messages wait in a busy loop, is used in most MPI implementations to achieve high communication performance. However, in some cases a kernel-level blocking receive is preferred. Some MPI implementations have an option to switch from user-level to kernel-level blocking with the sacrifice of communication performance. This paper identifies the problems when implementing kernel-level blocking receiving and proposes several techniques to avoid these problems. Evaluations show that the proposed kernel-level blocking techniques may achieve comparable performance with user-level communication.