hashFS: Applying Hashing to Optimize File Systems for Small File Reads

  • Authors:
  • Paul Lensing;Dirk Meister;André Brinkmann

  • Affiliations:
  • -;-;-

  • Venue:
  • SNAPI '10 Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/Os
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today’s file systems typically need multiple disk accesses for a single read operation of a file. In the worst case, when none of the needed data is already in the cache, the metadata for each component of the file path has to be read in. Once the metadata of the file has been obtained, an additional disk access is needed to read the actual file data. For a target scenario consisting almost exclusively of reading small files, which is typical in many Web 2.0 scenarios, this behavior severely impacts read performance. In this paper, we propose a new file system approach, which computes the expected location of a file using a hash function on the file path. Additionally, file metadata is stored together with the actual file data. Together, these characteristics allow a file to be read in with only a single disk access. The introduced approach is implemented extending the ext2 file system and stays very compatible with the Posix semantics. The results show very good random read performance nearly independent of the organization and size of the file set or the available cache size. In contrast, the performance of standard file systems is very dependent on these parameters.