High performance multi-node file copies and checksums for clustered file systems

  • Authors:
  • Paul Z. Kolano;Robert B. Ciotti

  • Affiliations:
  • NASA Advanced Supercomputing Division, NASA Ames Research Center, Moffett Field, CA;NASA Advanced Supercomputing Division, NASA Ames Research Center, Moffett Field, CA

  • Venue:
  • LISA'10 Proceedings of the 24th international conference on Large installation system administration
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mcp and msum are drop-in replacements for the standard cp and md5sum programs that utilize multiple types of parallelism and other optimizations to achieve maximum copy and checksum performance on clustered file systems. Multi-threading is used to ensure that nodes are kept as busy as possible. Read/write parallelism allows individual operations of a single copy to be overlapped using asynchronous I/O. Multi-node cooperation allows different nodes to take part in the same copy/checksum. Split file processing allows multiple threads to operate concurrently on the same file. Finally, hash trees allow inherently serial checksums to be performed in parallel. This paper presents the design of mcp and msum and detailed performance numbers for each implemented optimization. It will be shown how mcp improves cp performance over 27×, msum improves md5sum performance almost 19×, and the combination of mcp and msum improves verified copies via cp and md5sum by almost 22×.