SWARM: a scientific workflow for supporting bayesian approaches to improve metabolic models

Authors:
Xinghua Shi;Rick Stevens
Affiliations:
University of Chicago, Chicago, IL, USA;Argonne National Laboratory, Argonne, IL, USA
Venue:
CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
Year:
2008

Citing 9
Cited 1

The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
The SEED: a peer-to-peer environment for genome annotation

Communications of the ACM - Bioinformatics
A survey of data provenance in e-science

ACM SIGMOD Record
A taxonomy of scientific workflow systems for grid computing

ACM SIGMOD Record
Filling gaps in a metabolic network using expression information

Bioinformatics
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
GPFlow: an intuitive environment for web-based scientific workflow

Concurrency and Computation: Practice & Experience - First International Workshop on Workflow Systems in Grid Environments (WSGE2006)
Scientific workflow: a survey and research directions

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Actor-oriented design of scientific workflows

ER'05 Proceedings of the 24th international conference on Conceptual Modeling

A Bayesian Approach to High-Throughput Biological Model Generation

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the exponential growth of complete genome sequences, the analysis of these sequences is becoming a powerful approach to build genome-scale metabolic models. These models can be used to study individual molecular components and their relationships, and eventually study cells as systems. However, constructing genome-scale metabolic models manually is time-consuming and labor-intensive. This property of manual model-building process causes the fact that much fewer genome-scale metabolic models are available comparing to hundreds of genome sequences available. To tackle this problem, we design SWARM, a scientific workflow that can be utilized to improve genome-scale metabolic models in high-throughput fashion. SWARM deals with a range of issues including the integration of data across distributed resources, data format conversions, data update, and data provenance. Putting altogether, SWARM streamlines the whole modeling process that includes extracting data from various resources, deriving training datasets to train a set of predictors and applying Bayesian techniques to assemble the predictors, inferring on the ensemble of predictors to insert missing data, and eventually improving draft metabolic networks automatically. By the enhancement of metabolic model construction, SWARM enables scientists to generate many genome-scale metabolic models within a short period of time and with less effort.