GNARE: an environment for grid-based high-throughput genome analysis

Authors:
D. Sulakhe;A. Rodriguez;M. D'Souza;M. Wilde;V. Nefedova;I. Foster;N. Maltsev
Affiliations:
Math. & Comput. Sci. Div., Argonne Nat. Lab., IL, USA;Math. & Comput. Sci. Div., Argonne Nat. Lab., IL, USA;Math. & Comput. Sci. Div., Argonne Nat. Lab., IL, USA;Math. & Comput. Sci. Div., Argonne Nat. Lab., IL, USA;Math. & Comput. Sci. Div., Argonne Nat. Lab., IL, USA;Math. & Comput. Sci. Div., Argonne Nat. Lab., IL, USA;Math. & Comput. Sci. Div., Argonne Nat. Lab., IL, USA
Venue:
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid - Volume 01
Year:
2005

Citing 0
Cited 5

Experiences with developing and deploying dynamic BLAST

Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Analysis of application heartbeats: learning structural and temporal features in time series data for identification of performance problems

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A mobile agent based workflow rescheduling approach for grids

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Exploiting performance characterization of BLAST in the grid

Cluster Computing
Dependable Grid Workflow Scheduling Based on Resource Availability

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent progress in genomics and experimental biology has brought exponential growth of the biological information available for computational analysis in public genomics databases. However, applying the potentially enormous scientific value of this information to the understanding of biological systems requires computing and data storage technology of an unprecedented scale. The grid, with its aggregated and distributed computational and storage infrastructure, offers an ideal platform for high-throughput bioinformatics analysis. To leverage this we have developed the Genome Analysis Research Environment (GNARE) - a scalable computational system for the high-throughput analysis of genomes, which provides an integrated database and computational backend for data-driven bioinformatics applications. GNARE efficiently automates the major steps of genome analysis including acquisition of data from multiple genomic databases; data analysis by a diverse set of bioinformatics tools; and storage of results and annotations. High-throughput computations in GNARE are performed using distributed heterogeneous grid computing resources such as Grid2003, TeraGrid, and the DOE science grid. Multi-step genome analysis workflows involving massive data processing, the use of application-specific toots and algorithms and updating of an integrated database to provide interactive Web access to results are all expressed and controlled by a "virtual data" model which transparently maps computational workflows to distributed grid resources. This paper describes how Grid technologies such as Globus, Condor, and the Gryphyn virtual data system were applied in the development of GNARE. It focuses on our approach to Grid resource allocation and to the use of GNARE as a computational framework for the development of bioinformatics applications.