Performance and Scalability Evaluation of 'Big Memory' on Blue Gene Linux

Authors:
Kazutomo Yoshii;Kamil Iskra;Harish Naik;Pete Beckman;P. Chris Broekema
Affiliations:
Mathematics and Computer Science Division, Argonne NationalLaboratory, Argonne, IL, USA;Mathematics and Computer Science Division, Argonne NationalLaboratory, Argonne, IL, USA;Mathematics and Computer Science Division, Argonne NationalLaboratory, Argonne, IL, USA;Mathematics and Computer Science Division, Argonne NationalLaboratory, Argonne, IL, USA, Leadership Computing Facility, Argonne National Laboratory,Argonne, IL, USA;ASTRON, Netherlands Institute for Radio Astronomy, Dwingeloo,The Netherlands
Venue:
International Journal of High Performance Computing Applications
Year:
2011

Citing 12
Cited 9

Practical, transparent operating system support for superpages

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Practical performance portability in the Parallel Ocean Program (POP): Research Articles

Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
A Performance Model of the Parallel Ocean Program

International Journal of High Performance Computing Applications
Data cache prefetching design space exploration for BlueGene/L supercomputer

SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
Operating system issues for petascale systems

ACM SIGOPS Operating Systems Review
Designing a highly-scalable operating system: the Blue Gene/L story

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ZOID: I/O-forwarding infrastructure for petascale architectures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Benchmarking the effects of operating system interference on extreme-scale parallel machines

Cluster Computing
Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l

Proceedings of the 22nd annual international conference on Supercomputing
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Blue Gene/L programming and operating environment

IBM Journal of Research and Development
The LOFAR correlator: implementation and performance analysis

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

The LOFAR correlator: implementation and performance analysis

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Extending and benchmarking the "Big Memory" implementation on Blue Gene/P Linux

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
The LOFAR beam former: implementation and performance analysis

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
ExaScale high performance computing in the square kilometer array

Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date
Enabling event tracing at leadership-class scale through I/O forwarding middleware

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Better than native: using virtualization to improve compute node performance

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Introducing kernel-level page reuse for high performance computing

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Improving compute node performance using virtualization

International Journal of High Performance Computing Applications
Optimizing I/O forwarding techniques for extreme-scale event tracing

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address memory performance issues observed in Blue Gene Linux and discuss the design and implementation of â聙聵Big Memoryâ聙聶â聙聰â聙聰an alternative, transparent memory space introduced to eliminate the memory performance issues. We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4,096 nodes. We find that Big Memory successfully resolves the performance issues normally encountered in Blue Gene Linux. For the ocean simulation program, we even find that Linux with Big Memory provides better scalability than does the lightweight compute node kernel designed solely for high-performance applications. Originally intended exclusively for compute node tasks, our new memory subsystem dramatically improves the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray radio telescope as an example.