Assessing the performance of OpenMP programs on the intel xeon phi

  • Authors:
  • Dirk Schmidl;Tim Cramer;Sandra Wienke;Christian Terboven;Matthias S. Müller

  • Affiliations:
  • Center for Computing and Communication, RWTH Aachen University, Aachen, Germany,JARA High-Performance Computing, Aachen, Germany;Center for Computing and Communication, RWTH Aachen University, Aachen, Germany,JARA High-Performance Computing, Aachen, Germany;Center for Computing and Communication, RWTH Aachen University, Aachen, Germany,JARA High-Performance Computing, Aachen, Germany;Center for Computing and Communication, RWTH Aachen University, Aachen, Germany,JARA High-Performance Computing, Aachen, Germany;Center for Computing and Communication, RWTH Aachen University, Aachen, Germany,Chair for High Performance Computing, RWTH Aachen University, Aachen, Germany,JARA High-Performance Computing, Aache ...

  • Venue:
  • Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Intel Xeon Phi has been introduced as a new type of compute accelerator that is capable of executing native x86 applications. It supports programming models that are well-established in the HPC community, namely MPI and OpenMP, thus removing the necessity to refactor codes for using accelerator-specific programming paradigms. Because of its native x86 support, the Xeon Phi may also be used stand-alone, meaning codes can be executed directly on the device without the need for interaction with a host. In this sense, the Xeon Phi resembles a big SMP on a chip if its 240 logical cores are compared to a common Xeon-based compute node offering up to 32 logical cores. In this work, we compare a Xeon-based two-socket compute node with the Xeon Phi stand-alone in scalability and performance using OpenMP codes. Considering both as individual SMP systems, they come at a very similar price and power envelope, but our results show significant differences in absolute application performance and scalability. We also show in how far common programming idioms for the Xeon multi-core architecture are applicable for the Xeon Phi many-core architecture and which challenges the changing ratio of core count to single core performance poses for the application programmer.