Characterizing the Performance of "Big Memory” on Blue Gene Linux

  • Authors:
  • Kazutomo Yoshii;Kamil Iskra;Harish Naik;Pete Beckmanm;P. Chris Broekema

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICPPW '09 Proceedings of the 2009 International Conference on Parallel Processing Workshops
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficient use of Linux for high-performance applications on Blue Gene/P (BG/P) compute nodes is challenging because of severe performance hits resulting from translation lookaside buffer (TLB) misses and a hard-to-program torus network DMA controller. To address these difficulties, we present the design and implementation of “Big Memory”— an alternative, transparent memory space for computational processes. Big Memory uses extremely large memory pages available on PowerPC CPUs to create a TLB-miss-free, flat memory area that can be used for application code and data and is easier to use for DMA operations. One of our singlenode memory benchmarks shows that the performance gap between regular PowerPC Linux with 4KB pages and IBM BG/P compute node kernel (CNK) is about 68% in the worst case. Big Memory narrows the worst case performance gap to just 0.04%. We verify this result on 1024 nodes of Blue Gene/P using the NAS Parallel Benchmarks and find the performance under Linux with Big Memory to fluctuate within 0.7% of CNK. Originally intended exclusively for compute node tasks, our new memory subsystem turns out to dramatically improve the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray (LOFAR) radio telescope as an example.