Harvesting RDF triples

  • Authors:
  • Joe Futrelle

  • Affiliations:
  • National Center for Supercomputing Applications, Urbana, IL

  • Venue:
  • IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Managing scientific data requires tools that can track complex provenance information about digital resources and workflows. RDF triples are a convenient abstraction for combining independently-generated factual statements, including statements about provenance[1]. Harvesting is a strategy for asynchronously acquiring distributed information for the purposes of aggregation and analysis[2]. Harvesting typically requires that information be temporally scoped and attributed to some creator or information source. An RDF triple asserts a fact without attributing it to any actor or period of time, so the abstraction must be extended to support typical harvesting scenarios. This paper compares standard, conventional, and non-standard means of extending RDF triples to associate them with attribution and timing information. Then, it considers the implications of these techniques for harvesting and presents some implementation sketches based on a journaling strategy.