Using provenance to extract semantic file attributes

  • Authors:
  • Daniel Margo;Robin Smogor

  • Affiliations:
  • Harvard University;Harvard University

  • Venue:
  • TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Rich, semantically descriptive file attributes are valuable in many contexts, such as semantic namespaces and desktop search. Descriptive attributes help users to find files placed in seemingly-arbitrary locations by different applications. However, extracting semantic attributes from file contents is nontrivial. An alternative is to examine file provenance: how and when files are used, and the agents that use them. We study the extraction of semantic attributes from file provenance by applying data mining and machine learning techniques to file metadata. We show that provenance and other metadata predict semantic attributes such as file extensions. This complements previous work, which has shown that file extensions predict access patterns.