hashFS: Applying Hashing to Optimize File Systems for Small File Reads

Authors:
Paul Lensing;Dirk Meister;André Brinkmann
Affiliations:
-;-;-
Venue:
SNAPI '10 Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/Os
Year:
2010

Citing 0
Cited 2

CEFLS: A Cost-Effective File Lookup Service in a Distributed Metadata File System

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Direct lookup and hash-based metadata placement for local file systems

Proceedings of the 6th International Systems and Storage Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today’s file systems typically need multiple disk accesses for a single read operation of a file. In the worst case, when none of the needed data is already in the cache, the metadata for each component of the file path has to be read in. Once the metadata of the file has been obtained, an additional disk access is needed to read the actual file data. For a target scenario consisting almost exclusively of reading small files, which is typical in many Web 2.0 scenarios, this behavior severely impacts read performance. In this paper, we propose a new file system approach, which computes the expected location of a file using a hash function on the file path. Additionally, file metadata is stored together with the actual file data. Together, these characteristics allow a file to be read in with only a single disk access. The introduced approach is implemented extending the ext2 file system and stays very compatible with the Posix semantics. The results show very good random read performance nearly independent of the organization and size of the file set or the available cache size. In contrast, the performance of standard file systems is very dependent on these parameters.