Design and implementation of parallel file aggregation mechanism

Authors:
Jun Kato;Yutaka Ishikawa
Affiliations:
The University of Tokyo;The University of Tokyo
Venue:
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Year:
2011

Citing 10
Cited 0

The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
Improved parallel I/O via a two-phase run-time access strategy

ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
An extended two-phase method for accessing sections of out-of-core arrays

Scientific Programming
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Large files, small writes, and pNFS

Proceedings of the 20th annual international conference on Supercomputing
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Small-file access in parallel file systems

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Scalable massively parallel I/O to task-local files

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
PLFS: a checkpoint filesystem for parallel applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Some high-performance computing (HPC) applications sequentially access millions of several-MB-sized files whose total capacity can exceed the order of one terabyte. Although such an application can utilize a single shared file instead of this huge number of individual files, the single shared file approach is not often used due to the performance bottleneck inherent in this approach. PFA, or the Parallel File Aggregation mechanism, deployed on compute nodes, is proposed to promote the use of the single shared file approach utilizing enhanced I/O processing on parallel file systems, for such applications. It provides APIs based on the memory-map technique to avoid copy overhead between the user address space and the file cache in the kernel address space. It aggregates small I/Os into one chunk of data, whose size is nearly the same size as that of the file system block, so that the aggregated I/O is transferred through lock-free, direct I/O. The PFA mechanism also provides an incremental logging feature to enable de-duplication of data. The PFA mechanism achieves over five times higher write bandwidth and double or higher read bandwidth compared to the single shared file approach with the MPI-IO measured through the MPI-IO Test benchmark. The PFA mechanism also demonstrates that the execution time of a modified version of an application called Athena, which generates a huge number of files, is 3.8 times faster than that of the original program.