Data access and integration in the ISPIDER proteomics grid

Authors:
Lucas Zamboulis;Hao Fan;Khalid Belhajjame;Jennifer Siepen;Andrew Jones;Nigel Martin;Alexandra Poulovassilis;Simon Hubbard;Suzanne M. Embury;Norman W. Paton
Affiliations:
School of Computer Science and Information Systems, Univ. of London, Birkbeck;School of Computer Science and Information Systems, Univ. of London, Birkbeck;Faculty of Life Sciences, University of Manchester;Faculty of Life Sciences, University of Manchester;Faculty of Life Sciences, University of Manchester;School of Computer Science and Information Systems, Univ. of London, Birkbeck;School of Computer Science and Information Systems, Univ. of London, Birkbeck;Faculty of Life Sciences, University of Manchester;School of Computer Science, University of Manchester;School of Computer Science, University of Manchester
Venue:
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Year:
2006

Citing 9
Cited 8

Comprehension syntax

ACM SIGMOD Record
The object data standard: ODMG 3.0

The object data standard: ODMG 3.0
Distributed Query Processing on the Grid

GRID '02 Proceedings of the Third International Workshop on Grid Computing
DiscoveryLink: a system for integrated access to life sciences data sources

IBM Systems Journal - Deep computing for the life sciences
Transparent access to multiple bioinformatics information sources

IBM Systems Journal - Deep computing for the life sciences
The design and implementation of Grid database services in OGSA-DAI: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Taverna: a tool for the composition and enactment of bioinformatics workflows

Bioinformatics
BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis

Bioinformatics
Cluster based integration of heterogeneous biological databases using the automed toolkit

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences

The design and implementation of OGSA-DQP: A service-based distributed query processor

Future Generation Computer Systems
Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings

CAiSE '09 Proceedings of the 21st International Conference on Advanced Information Systems Engineering
EpiC: A Resource for Integrating Information and Analyses to Enable Selection of Epitopes for Antibody Based Experiments

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Feedback-based annotation, selection and refinement of schema mappings for dataspaces

Proceedings of the 13th International Conference on Extending Database Technology
Bioinformatics service reconciliation by heterogeneous schema transformation

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Query relaxation in RDF

Journal on data semantics X
Query performance evaluation of an architecture for fine-grained integration of heterogeneous grid data sources

Future Generation Computer Systems
Grid and distributed public computing schemes for structural proteomics: a short overview

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources.