Using the Intel MPI benchmarks (IMB) to evaluate MPI implementations on an Infiniband Nehalem Linux cluster

Authors:
Ahmed Bukhamsin;Mohamad Sindi;Jallal Al-Jallal
Affiliations:
EXPEC Computer Center, Saudi Aramco, Dhahran, Saudi Arabia;EXPEC Computer Center, Saudi Aramco, Dhahran, Saudi Arabia;EXPEC Computer Center, Saudi Aramco, Dhahran, Saudi Arabia
Venue:
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Year:
2010

Citing 2
Cited 1

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid

Embedded Processor Virtualization for Broadband Grid Computing

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

According to Moore's Law, computer speeds are expected to double approximately every 2 years. But with the current challenges that computer manufacturers are facing to double speeds of individual processors, due to various reasons, i.e., processor temperatures; multiprocessor architectures have become more popular nowadays. Eventually, this has led to an increased interest in standards for writing parallel applications. The Message Passing Interface (MPI) has become the de facto standard for writing applications to run on such systems. Our growing needs for the latest high performance computing solutions in Saudi Aramco, the world's largest oil producing company, has given us the opportunity to evaluate three of the most commonly used MPI implementations, MVAPICH2, Open MPI, and Intel MPI on Intel's latest Nehalem processor. In this paper, we describe our test bed environment along with the evaluations that we did using some of the Intel MPI Benchmarks (IMB). We discuss the results in terms of bandwidth speed, execution time, and scalability when running on up to 512 processors on an Infiniband Nehalem Linux cluster with 64 nodes using the three implementations of MPI. We finally state our conclusions, recommendations, and future work.