Gordon: design, performance, and experiences deploying and supporting a data intensive supercomputer

  • Authors:
  • Shawn M. Strande;Pietro Cicotti;Robert S. Sinkovits;William S. Young;Rick Wagner;Mahidhar Tatineni;Eva Hocks;Allan Snavely;Mike Norman

  • Affiliations:
  • University of California, San Diego, Gilman Drive, La Jolla, California;University of California, San Diego, Gilman Drive, La Jolla, California;University of California, San Diego, Gilman Drive, La Jolla, California;University of California, San Diego, Gilman Drive, La Jolla, California;University of California, San Diego, Gilman Drive, La Jolla, California;University of California, San Diego, Gilman Drive, La Jolla, California;University of California, San Diego, Gilman Drive, La Jolla, California;University of California, San Diego, Gilman Drive, La Jolla, California;University of California, San Diego, Gilman Drive, La Jolla, California

  • Venue:
  • Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Gordon data intensive supercomputer entered service in 2012 as an allocable computing system in the NSF Extreme Science and Engineering Discovery Environment (XSEDE) program. Gordon has several innovative features that make it ideal for data intensive computing including: 1,024, compute nodes based on Intel's Sandy Bridge (Xeon E5) processor; 64 I/O nodes with an aggregate of 300 TB of high performance flash (SSD); large, virtual SMP "supernodes" of up to 2 TB DRAM; a dual-rail, QDR InfiniBand, 3D torus network based on commodity hardware and open source software; and a 100 GB/s Lustre based parallel file system, with over 4 PB of disk space. In this paper we present the motivation, design, and performance of Gordon. We provide: low level micro-benchmark results to demonstrate processor, memory, I/O, and network performance; standard HPC benchmarks; and performance on data intensive applications to demonstrate Gordon's performance on typical workloads. We highlight the inherent risks in, and describe mitigation strategies for, deploying a data intensive supercomputer like Gordon which embodies significant innovative technologies. Finally we present our experiences thus far in supporting users and managing Gordon.