Mining infrequently-accessed file correlations in distributed file system

  • Authors:
  • Lihua Yu;Gang Chen;Jinxiang Dong

  • Affiliations:
  • College Of Computer Science, Zhejiang University, Hangzhou, Zhejiang, P.R. China;College Of Computer Science, Zhejiang University, Hangzhou, Zhejiang, P.R. China;College Of Computer Science, Zhejiang University, Hangzhou, Zhejiang, P.R. China

  • Venue:
  • APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

File correlation mining, as a technique to enhance file system performance, can usually be exploited for many purposes such as to improve the effectiveness of cache, to optimize file layout, as well as to enable disk file prefetching. While most research works on file correlations focus on traditional stand-alone file systems, this paper investigates the problem of mining file correlations in a distributed environment. We present a parallel data mining algorithm called PFC-Miner (Parallel File Correlation Miner), which is based on Locality Sensitive Hashing. PFC-Miner can efficiently discover correlations between infrequently-accessed files which are more valuable for web applications. Experimental results show that PFC-Miner can efficiently discover file correlations in distributed file systems without compromising the accuracy, and that the proposed approach has good scalability.