Biocompute 2.0: an improved collaborative workspace for data intensive bio-science

Authors:
Rory Carmichael;Patrick Braga-Henebry;Douglas Thain;Scott Emrich
Affiliations:
Bioinformatics Core Facility, University of Notre Dame, Notre Dame, IN, 46556, USA;IMC Financial Markets, 233 South Wacker Drive #4300, Chicago, IL, 60606, USA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
Venue:
Concurrency and Computation: Practice & Experience
Year:
2011

Citing 0
Cited 2

Resource Management for Elastic Cloud Workflows

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids

Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The explosion of data in the biological community requires scalable and flexible portals for bioinformatics. To help address this need, we proposed characteristics needed for rigorous, reproducible, and collaborative resources for data-intensive science. Implementing a system with these characteristics exposed challenges in user interface, data distribution, and workflow description/execution. We describe ongoing responses to these and other challenges. Our Data-Action-Queue design pattern addresses user interface and system organization concepts. A dynamic data distribution mechanism lays the foundation for the management of persistent datasets. Makeflow facilitates the simple description and execution of complex multi-part jobs and forms the kernel of a module system powering diverse bioinformatics applications. Our improved web portal, Biocompute 2.0, has been in production use since the summer of 2010. Through it and its predecessor, we have provided over 56 years of CPU time through its five modules—BLAST, SSAHA, SHRIMP, BWA, and SNPEXP—to research groups at three universities. In this paper, we describe the goals and interface to the system, its architecture and performance, and the insights gained in its development. Copyright © 2011 John Wiley & Sons, Ltd.