Cluster based integration of heterogeneous biological databases using the automed toolkit

  • Authors:
  • Michael Maibaum;Lucas Zamboulis;Galia Rimon;Christine Orengo;Nigel Martin;Alexandra Poulovassilis

  • Affiliations:
  • Department of Biochemistry and Molecular Biology, University College London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London;Department of Biochemistry and Molecular Biology, University College London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London

  • Venue:
  • DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an extensible architecture that can be used to support the integration of heterogeneous biological data sets. In our architecture, a clustering approach has been developed to support distributed biological data sources with inconsistent identification of biological objects. The architecture uses the AutoMed data integration toolkit to store the schemas of the data sources and the semi-automatically generated transformations from the source data into the data of an integrated warehouse. AutoMed supports bi-directional, extensible transformations which can be used to update the warehouse data as entities change, are added, or are deleted in the data sources. The transformations can also be used to support the addition or removal of entire data sources, or evolutions in the schemas of the data sources or of the warehouse itself. The results of using the architecture for the integration of existing genomic data sets are discussed.