Cluster based integration of heterogeneous biological databases using the automed toolkit

Authors:
Michael Maibaum;Lucas Zamboulis;Galia Rimon;Christine Orengo;Nigel Martin;Alexandra Poulovassilis
Affiliations:
Department of Biochemistry and Molecular Biology, University College London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London;Department of Biochemistry and Molecular Biology, University College London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London
Venue:
DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Year:
2005

Citing 9
Cited 2

Comprehension syntax

ACM SIGMOD Record
A formalisation of semantic schema integration

Information Systems
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Schema Evolution in Heterogeneous Database Architectures, A Schema Transformation Approach

CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
Using AutoMed metadata in data warehousing environments

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
DiscoveryLink: a system for integrated access to life sciences data sources

IBM Systems Journal - Deep computing for the life sciences
K2/Kleisli and GUS: experiments in integrated access to genomic data sources

IBM Systems Journal - Deep computing for the life sciences
Transparent access to multiple bioinformatics information sources

IBM Systems Journal - Deep computing for the life sciences
Integration of biological sources: current systems and challenges ahead

ACM SIGMOD Record

Data access and integration in the ISPIDER proteomics grid

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
BioFuice: mapping-based data integration in bioinformatics

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an extensible architecture that can be used to support the integration of heterogeneous biological data sets. In our architecture, a clustering approach has been developed to support distributed biological data sources with inconsistent identification of biological objects. The architecture uses the AutoMed data integration toolkit to store the schemas of the data sources and the semi-automatically generated transformations from the source data into the data of an integrated warehouse. AutoMed supports bi-directional, extensible transformations which can be used to update the warehouse data as entities change, are added, or are deleted in the data sources. The transformations can also be used to support the addition or removal of entire data sources, or evolutions in the schemas of the data sources or of the warehouse itself. The results of using the architecture for the integration of existing genomic data sets are discussed.