PLFS: a checkpoint filesystem for parallel applications

Authors:
John Bent;Garth Gibson;Gary Grider;Ben McClelland;Paul Nowoczynski;James Nunez;Milo Polte;Meghan Wingate
Affiliations:
Los Alamos National Lab;Carnegie Mellon University;Los Alamos National Lab;Los Alamos National Lab;Pittsburgh Supercomputing;Los Alamos National Lab;Carnegie Mellon University;Los Alamos National Lab
Venue:
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Year:
2009

Citing 25
Cited 36

The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
Diskless Checkpointing

IEEE Transactions on Parallel and Distributed Systems
Memory exclusion: optimizing the performance of checkpointing systems

Software—Practice & Experience
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Run-time adaptation in river

ACM Transactions on Computer Systems (TOCS)
Low-Latency, Concurrent Checkpointing for Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
The Scotch parallel storage systems

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Bypass: A Tool for Building Split Execution Systems

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery

IEEE Transactions on Dependable and Secure Computing
Parallel netCDF: A High-Performance Scientific I/O Interface

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Parallel File System Testing for the Lunatic Fringe: The Care and Feeding of Restless I/O Power Users

MSST '05 Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies
A higher order estimate of the optimum checkpoint interval for restart dumps

Future Generation Computer Systems
Exploiting Lustre File Joining for Effective Collective IO

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
File system design for an NFS file server appliance

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
FiST: a language for stackable file systems

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Failure trends in a large disk drive population

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Scalable performance of the Panasas parallel file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
GIGA+: scalable directories for shared file systems

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)

CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
stdchk: A Checkpoint Storage System for Desktop Grid Computing

ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Coordinating government funding of file system and I/O research through the high end computing university research activity

ACM SIGOPS Operating Systems Review
A Large-Scale Study of Failures in High-Performance Computing Systems

IEEE Transactions on Dependable and Secure Computing

Data layout optimization for petascale file systems

Proceedings of the 4th Annual Workshop on Petascale Data Storage
Distributed Diskless Checkpoint for Large Scale Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A layout-aware optimization strategy for collective I/O

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Accelerating parallel analysis of scientific simulation data via Zazen

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Managing Variability in the IO Performance of Petascale Storage Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scale and concurrency of GIGA+: file system directories with millions of files

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Design and implementation of parallel file aggregation mechanism

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Six degrees of scientific data: reading patterns for extreme scale science IO

Proceedings of the 20th international symposium on High performance distributed computing
Okeanos: wasteless journaling for fast and reliable multistream storage

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
QoS support for end users of I/O-intensive applications using shared storage systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FTI: high performance fault tolerance interface for hybrid systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
BlobCR: efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Using active NVRAM for I/O staging

Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
Toward efficient search for ultrascale storage systems

Proceedings of the first annual workshop on High performance computing meets databases
Scientific data services: a high-performance I/O system with array semantics

Proceedings of the first annual workshop on High performance computing meets databases
Pattern-aware file reorganization in MPI-IO

Proceedings of the sixth workshop on Parallel Data Storage
Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories

ACM Transactions on Storage (TOS)
Light-Weight parallel i/o analysis at scale

EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
On the viability of checkpoint compression for extreme scale fault tolerance

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Can checkpoint/restart mechanisms benefit from hierarchical data staging?

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Enabling event tracing at leadership-class scale through I/O forwarding middleware

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
The TokuFS streaming file system

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers

Journal of Parallel and Distributed Computing
Improving Bandwidth Efficiency for Consistent Multistream Storage

ACM Transactions on Storage (TOS)
I/O acceleration with pattern detection

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds

Journal of Parallel and Distributed Computing
Toward millions of file system IOPS on low-cost, commodity hardware

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Insights for exascale IO APIs from building a petascale IO API

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
TABLEFS: enhancing metadata efficiency in the local file system

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Structuring PLFS for extensibility

PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating incremental checkpointing for extreme-scale computing

Future Generation Computer Systems
Optimizing I/O forwarding techniques for extreme-scale event tracing

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel applications running across thousands of processors must protect themselves from inevitable system failures. Many applications insulate themselves from failures by checkpointing. For many applications, checkpointing into a shared single file is most convenient. With such an approach, the size of writes are often small and not aligned with file system boundaries. Unfortunately for these applications, this preferred data layout results in pathologically poor performance from the underlying file system which is optimized for large, aligned writes to non-shared files. To address this fundamental mismatch, we have developed a virtual parallel log structured file system, PLFS. PLFS remaps an application's preferred data layout into one which is optimized for the underlying file system. Through testing on PanFS, Lustre, and GPFS, we have seen that this layer of indirection and reorganization can reduce checkpoint time by an order of magnitude for several important benchmarks and real applications without any application modification.