A file search method based on intertask relationships derived from access frequency and RMC operations on files

  • Authors:
  • Yi Wu;Kenichi Otagiri;Yousuke Watanabe;Haruo Yokota

  • Affiliations:
  • NTT DATA Corporation and Department of Computer Science, Tokyo Institute of Technology;Cowbell Engineering Corporation and Department of Computer Science, Tokyo Institute of Technology;Global Scientific Information and Computing Center, Tokyo Institute of Technology;Department of Computer Science, Tokyo Institute of Technology

  • Venue:
  • DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The tremendous growth in the number of files stored in filesystems makes it increasingly difficult to find desired files. Traditional keyword-based search engines are incapable of retrieving files that do not include keywords. To tackle this problem, we use file-access logs to derive intertask relationships for file search. Our observations are that 1) files related to the same task are frequently used together, and 2) a set of Rename, Move, and Copy (RMC) operations tends to initiate a new task. We have implemented a system named SUGOI, which detects two types of task, FI tasks and RMC tasks, from file-access logs. An FI task corresponds to a group of files frequently accessed together. An RMC task is generated by RMC operations and then constructs a graph of intertask relationships based on the influence of RMC operations and the similarity between tasks. In utilizing detected tasks and intertask relationships, our system expands the search results of a keyword-based search engine. Experiments using actual file-access logs indicate that the proposed approach significantly improves search results.