A data lineage model for distributed sub-image processing

  • Authors:
  • Johnson Mwebaze;John McFarland;Danny Booxhorn;Edwin Valentijn

  • Affiliations:
  • Makerere University and University of Groningen, Groningen, The Netherlands;University of Groningen, Groningen, The Netherlands;University of Groningen, Groningen, The Netherlands;University of Groningen, Groningen, The Netherlands

  • Venue:
  • SAICSIT '10 Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

An important challenge facing e-Science is the development of scalable systems and analysis techniques that allow client applications to locate data and services in increasingly large-scale distributed environments. e-Science Systems should achieve three main goals: (i) efficient and selective processing of data, (ii) support network collaboration without clogging distribution networks; and (iii) allow transparency of experiments through repeatability and verifiability of experiments. Several systems have addressed limited combinations of these properties, but we address all three in this work. We describe the architecture and implementation of such a framework in Astro-WISE, an astronomical approach to distributed data processing, discovery and retrieval of datasets that achieves scalability via dynamic linking (data lineage) maintained within the system. We show that lineage data collected during the processing and analysis of datasets can be reused to perform selective reprocessing(at sub-image level)ondatasets while the remainder of the dataset is untouched, a rather difficult process to automate without lineage.