Proceedings of the Second Conference on Hypercube Multiprocessors on Hypercube multiprocessors
Proceedings of the Second Conference on Hypercube Multiprocessors on Hypercube multiprocessors
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Investigation of leading HPC I/O performance using a scientific-application derived benchmark
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scaling parallel I/O performance through I/O delegate and caching system
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications
Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications
Application level I/O caching on Blue Gene/P systems
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Adaptable, metadata rich IO methods for portable high performance IO
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
I/O performance challenges at leadership scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
GPFS: a shared-disk file system for large computing clusters
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Parallel I/O Performance for Application-Level Checkpointing on the Blue Gene/P System
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Delegation-Based I/O Mechanism for High Performance Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Scalable in situ scientific data encoding for analytical query processing
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Hi-index | 0.00 |
Application-level checkpointing has been one of the most popular techniques to proactively deal with unexpected failures in supercomputers with hundreds of thousands of cores. Unfortunately, this approach results in heavy I/O load and often causes I/O bottlenecks in production runs. In this paper, we examine a new thread-based application-level checkpointing for a massively parallel electromagnetic solver system on the IBM Blue Gene/P at Argonne National Laboratory and the Cray XK6 at Oak Ridge National Laboratory. We discuss an I/O-thread based, application-level, two-phase I/O approach, called "threaded reduced-blocking I/O" (threaded rbIO), and compare it with a regular version of "reduced-blocking I/O" (rbIO) and a tuned MPI-IO collective approach (coIO). Our study shows that threaded rbIO can overlap the I/O latency with computation and achieve near-asynchronous checkpoint with an application-perceived I/O performance of over 70 GB/s (raw of 15 GB/s) and 50 GB/s (raw I/O bandwidth of 17 GB/s) on up to 32K processors of Intrepid and Jaguar, respectively. Compared with rbIO and coIO, the threading approach greatly improves the production performance of NekCEM on Blue Gene/P and Cray XK6 machines by significantly reducing the total simulation time from checkpoint blocking reduction. We also discuss the potential strength of this approach with the Scalable Checkpoint Restart library and on other full-featured operating systems such as that to be deployed on the upcoming Blue Gene/Q.