An early performance evaluation of many integrated core architecture based SGI rackable computing system

Authors:
Subhash Saini;Haoqiang Jin;Dennis Jespersen;Huiyu Feng;Jahed Djomehri;William Arasin;Robert Hood;Piyush Mehrotra;Rupak Biswas
Affiliations:
NASA Ames Research Center, Moffett Field, CA;NASA Ames Research Center, Moffett Field, CA;NASA Ames Research Center, Moffett Field, CA;SGI Fremont, CA;Computer Sciences Corporation, Moffett Field, CA;Computer Sciences Corporation, Moffett Field, CA;Computer Sciences Corporation, Moffett Field, CA;NASA Ames Research Center, Moffett Field, CA;NASA Ames Research Center, Moffett Field, CA
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 13
Cited 0

Linear algebra operators for GPU implementation of numerical algorithms

ACM SIGGRAPH 2003 Papers
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
High Resolution Aerospace Applications using the NASA Columbia Supercomputer

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Performance evaluation of supercomputers using HPCC and IMB Benchmarks

Journal of Computer and System Sciences
Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Programming the Intel 80-core network-on-a-chip terascale processor

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Early performance evaluation of a "Nehalem" cluster using scientific and engineering applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Performance Analysis of Scientific and Engineering Applications Using MPInside and TAU

HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
The impact of hyper-threading on processor resource utilization in production applications

HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
A microbenchmark suite for OpenMP tasks

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Memory performance at reduced CPU clock speeds: an analysis of current x86_64 processors

HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Intel recently introduced the Xeon Phi coprocessor based on the Many Integrated Core architecture featuring 60 cores with a peak performance of 1.0 Tflop/s. NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8-core Sandy Bridge processors along with two Xeon Phi 5110P coprocessors. We have conducted an early performance evaluation of the Xeon Phi. We used microbenchmarks to measure the latency and bandwidth of memory and interconnect, I/O rates, and the performance of OpenMP directives and MPI functions. We also used OpenMP and MPI versions of the NAS Parallel Benchmarks along with two production CFD applications to test four programming modes: offload, processor native, coprocessor native and symmetric (processor plus coprocessor). In this paper we present preliminary results based on our performance evaluation of various aspects of a Phi-based system.