FARMER: a novel approach to file access correlation mining and evaluation reference model for optimizing peta-scale file system performance

  • Authors:
  • Peng Xia;Dan Feng;Hong Jiang;Lei Tian;Fang Wang

  • Affiliations:
  • Huazhong University of Science and Technology & Wuhan National Laboratory for Optoelectronics, WuHan, China;Huazhong University of Science and Technology & Wuhan National Laboratory for Optoelectronics, WuHan, China;University of Nebraska-Lincoln, Lincoln, NE, USA;University of Nebraska-Lincoln, Lincoln, NE, USA;Huazhong University of Science and Technology & Wuhan National Laboratory for Optoelectronics, WuHan, China

  • Venue:
  • HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

File correlation, which refers to a relationship among related files that can manifest in the form of their common access locality (temporal and/or spatial), has become an increasingly important consideration for performance enhancement in peta-scale storage systems. Previous studies on file correlations mainly concern with two aspects of files: file access sequence and semantic attribute. Based on mining with regard to these two aspects of file systems, various strategies have been proposed to optimize the overall system performance. Unfortunately, all of these studies consider either file access sequences or semantic attribute information separately and in isolation, thus unable to accurately and effectively mine file correlations, especially in large-scale distributed storage systems. This paper introduces a novel File Access coRrelation Mining and Evaluation Reference model (FARMER) for optimizing petascale file system performance that judiciously considers both file access sequences and semantic attributes simultaneously to evaluate the degree of file correlations by leveraging the Vector Space Model (VSM) technique adopted from the Information Retrieval field. We extract the file correlation knowledge from some typical file system traces using FARMER, and incorporate FARMER into a real large-scale object-based storage system as a case study to dynamically infer file correlations and evaluate the benefits and costs of a FARMER-enabled prefetching algorithm for the metadata servers under real file system workloads. Experimental results show that FARMER can mine and evaluate file correlations more accurately and effectively. More significantly, the FARMER-enabled prefetching algorithm is shown to reduce the metadata operations latency by approximately 24-35% when compared to a state-of-the-art metadata prefetching algorithm and a commonly used replacement policy.