Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scale and concurrency of GIGA+: file system directories with millions of files
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Hi-index | 0.00 |
Petascale computing systems pose serious scalability challenges for any data storage system. Lustre is a scalable, secure, robust, highly-available cluster file system that has been successfully deployed on some of the largest supercomputing systems in the world, including the BlueGene/L supercomputer at the Lawrence Livermore National Laboratory (LLNL), the Red Storm supercluster at Sandia National Laboratories and the Jaguar supercomputer at the Oak Ridge National Laboratory. This paper provides file system developers with insight into how network file system scalability is addressed in the Lustre file system through policies and algorithms that support distributed lock management and options for facilitating recovery after a compute node failure in a large scale cluster. These design approaches can be applied to the scaling of other file systems to support large clusters.