Towards an I/O tracing framework taxonomy
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
PLFS: a checkpoint filesystem for parallel applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
I/O performance challenges at leadership scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
High performance multi-node file copies and checksums for clustered file systems
LISA'10 Proceedings of the 24th international conference on Large installation system administration
McrEngine: a scalable checkpointing system using data-aware aggregation and compression
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
McrEngine: A scalable checkpointing system using data-aware aggregation and compression
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Over the last several years there has been a major thrust at the Lawrence Livermore National Laboratory toward building extremely large scale computing clusters based on open source software and commodity hardware. On the storage front, our efforts have focused upon the development of the Lustre file system and bringing it into production in our computer center. Given our customers' requirements, it is assured that we will be living on the bleeding edge with this file system software as we press it into production. A further reality is that our partners are not able to duplicate the scale of systems as required for these testing purposes. For these practical reasons, the onus for file system testing at scale has fallen largely upon us. As an integral part of our testing efforts, we have developed programs for stress and performance testing of parallel file systems. This paper focuses on these unique test programs and upon how we apply them to understand the usage and failure modes of such large-scale parallel file systems.