Energy-aware I/O optimization for checkpoint and restart on a NAND flash memory system

Authors:
Takafumi Saito;Kento Sato;Hitoshi Sato;Satoshi Matsuoka
Affiliations:
Tokyo Institute of Technology, Tokyo, Japan;Tokyo Institute of Technology, Tokyo, Japan;Tokyo Institute of Technology, Tokyo, Japan;Tokyo Institute of Technology, Tokyo, Japan
Venue:
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
Year:
2013

Citing 16
Cited 1

Diskless Checkpointing

IEEE Transactions on Parallel and Distributed Systems
On Checkpoint Latency

On Checkpoint Latency
Modeling the Impact of Checkpoints on Next-Generation Systems

MSST '07 Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies
DataStager: scalable data staging services for petascale applications

Proceedings of the 18th ACM international symposium on High performance distributed computing
Energy-Aware Prefetching for Parallel Disk Systems: Algorithms, Models, and Evaluation

NCA '09 Proceedings of the 2009 Eighth IEEE International Symposium on Network Computing and Applications
HYBUD: An Energy-Efficient Architecture for Hybrid Parallel Disk Systems

ICCCN '09 Proceedings of the 2009 Proceedings of 18th International Conference on Computer Communications and Networks
IESP Exascale Challenge: Co-Design of Architectures and Algorithms

International Journal of High Performance Computing Applications
Characterizing flash memory: anomalies, observations, and applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications

IEEE Transactions on Parallel and Distributed Systems
Towards Energy Aware Scheduling for Precedence Constrained Parallel Tasks in a Cluster with DVFS

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Distributed Diskless Checkpoint for Large Scale Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Enhanced Energy-Efficient Scheduling for Parallel Applications in Cloud

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Design and modeling of a non-blocking checkpointing system

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Evaluating energy savings for checkpoint/restart

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Both energy efficiency and system reliability are significant concerns towards exa-scale high-performance computing. In such large HPC systems, applications are required to conduct massive I/O operations to local storage devices (e.g. a NAND flash memory) for scalable checkpoint and restart. However, checkpoint/restart can use a large portion of runtime, and consumes enormous energy by non-I/O subsystems, such as CPU and memory. Thus, energy-aware optimization, including I/O operations to storage, is required for checkpoint/restart. In this paper, we present a profile-based I/O optimization technique for NAND flash memory devices based on Markov model for checkpoint/restart. The results based on performance studies show that our profile lookup approach can save 4.1% of energy consumption in an application execution with checkpoint/restart. Especially, our approach improves the energy consumption of write operations by 67.4% and read operations by 40.2% on a PCIe-attached NAND flash memory device.