Small-file access in parallel file systems

Authors:
Philip Carns;Sam Lang;Robert Ross;Murali Vilayannur;Julian Kunkel;Thomas Ludwig
Affiliations:
Mathematics and Computer Science Division, Argonne National Laboratory, IL 60439, USA;Mathematics and Computer Science Division, Argonne National Laboratory, IL 60439, USA;Mathematics and Computer Science Division, Argonne National Laboratory, IL 60439, USA;VMware Inc., 3401 Hillview Ave., Palo Alto, CA 94304, USA;Institute of Computer Science, University of Heidelberg, Germany;Institute of Computer Science, University of Heidelberg, Germany
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 9

I/O performance challenges at leadership scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Case studies in storage access by loosely coupled petascale applications

Proceedings of the 4th Annual Workshop on Petascale Data Storage
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Comparing Hadoop and Fat-Btree based access method for small file I/O applications

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Design and implementation of parallel file aggregation mechanism

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Implementation of a distributed data storage system with resource monitoring on cloud computing

GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers

Journal of Parallel and Distributed Computing
Improving Bandwidth Efficiency for Consistent Multistream Storage

ACM Transactions on Storage (TOS)
Design of an active storage cluster file system for DAG workflows

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's computational science demands have resulted in ever larger parallel computers, and storage systems have grown to match these demands. Parallel file systems used in this environment are increasingly specialized to extract the highest possible performance for large I/O operations, at the expense of other potential workloads. While some applications have adapted to I/O best practices and can obtain good performance on these systems, the natural I/O patterns of many applications result in generation of many small files. These applications are not well served by current parallel file systems at very large scale. This paper describes five techniques for optimizing small-file access in parallel file systems for very large scale systems. These five techniques are all implemented in a single parallel file system (PVFS) and then systematically assessed on two test platforms. A microbenchmark and the mdtest benchmark are used to evaluate the optimizations at an unprecedented scale. We observe as much as a 905% improvement in small-file create rates, 1,106% improvement in small-file stat rates, and 727% improvement in small-file removal rates, compared to a baseline PVFS configuration on a leadership computing platform using 16,384 cores.