Active data: a data-centric approach to data life-cycle management

  • Authors:
  • Anthony Simonet;Gilles Fedak;Matei Ripeanu;Samer Al-Kiswany

  • Affiliations:
  • INRIA/University of Lyon, France;INRIA/University of Lyon, France;Univeristy of British Columbia, Canada;Univeristy of British Columbia, Canada

  • Venue:
  • PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. In this paper, we argue that data management for data-intensive science applications requires a fundamentally different management approach than the current ad-hoc task centric approach. We propose Active Data, a fundamentally novel paradigm for data life cycle management. Active Data follows two principles: data-centric and event-driven. We report on the Active Data programming model and its preliminary implementation, and discuss the benefits and limitations of the approach on recognized challenging data-intensive science use-cases.