A Prefetching Scheme Exploiting both Data Layout and Access History on Disk

  • Authors:
  • Song Jiang;Xiaoning Ding;Yuehai Xu;Kei Davis

  • Affiliations:
  • Wayne State University;New Jersey Institute of Technology;Wayne State University;Los Alamos National Laboratory

  • Venue:
  • ACM Transactions on Storage (TOS)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prefetching is an important technique for improving effective hard disk performance. A prefetcher seeks to accurately predict which data will be requested and load it ahead of the arrival of the corresponding requests. Current disk prefetch policies in major operating systems track access patterns at the level of file abstraction. While this is useful for exploiting application-level access patterns, for two reasons file-level prefetching cannot realize the full performance improvements achievable by prefetching. First, certain prefetch opportunities can only be detected by knowing the data layout on disk, such as the contiguous layout of file metadata or data from multiple files. Second, nonsequential access of disk data (requiring disk head movement) is much slower than sequential access, and the performance penalty for mis-prefetching a randomly located block, relative to that of a sequential block, is correspondingly greater. To overcome the inherent limitations of prefetching at logical file level, we propose to perform prefetching directly at the level of disk layout, and in a portable way. Our technique, called DiskSeen, is intended to be supplementary to, and to work synergistically with, any present file-level prefetch policies. DiskSeen tracks the locations and access times of disk blocks and, based on analysis of their temporal and spatial relationships, seeks to improve the sequentiality of disk accesses and overall prefetching performance. It also implements a mechanism to minimize mis-prefetching, on a per-application basis, to mitigate the corresponding performance penalty. Our implementation of the DiskSeen scheme in the Linux 2.6 kernel shows that it can significantly improve the effectiveness of prefetching, reducing execution times by 20%--60% for microbenchmarks and real applications such as grep, CVS, and TPC-H. Even for workloads specifically designed to expose its weaknesses, DiskSeen incurs only minor performance loss.