Efficient access to many samall files in a filesystem for grid computing

  • Authors:
  • Douglas Thain;Christopher Moretti

  • Affiliations:
  • University of Notre Dame, The Netherland;University of Notre Dame, The Netherland

  • Venue:
  • GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many potential users of grid computing systems have a need to manage large numbers of small files. However, computing and storage grids are generally optimized for the management of large files. As a result, users with small files achieve performance several orders of magnitude worse than possible. Archival tools and custom storage structures can be used to improve small-file performance, but this requires the end user to change the behavior of the application, which is not always practical. To address this problem, we augment the protocol of the Chirp filesystem for grid computing to improve small file performance. We describe in detail how this protocol compares to FTP and NFS, which are widely used in similar situations. In addition, we observe that changes to the system call interface are necessary to invoke the protocol properly. We demonstrate an order-of-magnitude performance improvement over existing protocols for copying files and manipulating large directory trees.