Biocompute: towards a collaborative workspace for data intensive bio-science

Authors:
Rory Carmichael;Patrick Braga-Henebry;Douglas Thain;Scott Emrich
Affiliations:
University of Notre Dame, Notre Dame, IN;IMC Financial Markets, Chicago, IL;University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN
Venue:
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Year:
2010

Citing 5
Cited 3

Special Issue: The First Provenance Challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and efficient storage of e-Science experiment provenance

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Tracking provenance in a virtual data grid

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions

Cluster Computing

Adaptive, secure, and scalable distributed data outsourcing: a vision paper

Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems
The topology aware file distribution problem

COCOON'11 Proceedings of the 17th annual international conference on Computing and combinatorics
The topology aware file distribution problem

Journal of Combinatorial Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

The explosion of data in the biological community demands the development of more scalable and flexible portals for bioinformatic computation. To address this need, we put forth characteristics needed for rigorous, reproducible, and collaborative resources for data intensive science. Implementing a system with these characteristics exposed challenges in user interface, data distribution, and workflow description/execution. We describe several responses to these challenges. The Data-Action-Queue metaphor addresses user interface and system organization concepts. A dynamic data distribution mechanism lays the foundation for the management of persistent datasets. The Makeflow workflow facilitates the simple description and execution of complex multipart jobs. The resulting web portal, Biocompute, has been in production use at the University of Notre Dame's Bioinformatics Core Facility since the summer of 2009. It has provided over seven years of CPU time through its three sequence search modules --- BLAST, SSAHA, and SHRIMP --- to ten biological and bioinformatic research groups spanning three universities. In this paper we describe the goals and interface to the system, its architecture and performance, and the insights gained in its development.