Biocompute 2.0: an improved collaborative workspace for data intensive bio-science

  • Authors:
  • Rory Carmichael;Patrick Braga-Henebry;Douglas Thain;Scott Emrich

  • Affiliations:
  • Bioinformatics Core Facility, University of Notre Dame, Notre Dame, IN, 46556, USA;IMC Financial Markets, 233 South Wacker Drive #4300, Chicago, IL, 60606, USA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA

  • Venue:
  • Concurrency and Computation: Practice & Experience
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The explosion of data in the biological community requires scalable and flexible portals for bioinformatics. To help address this need, we proposed characteristics needed for rigorous, reproducible, and collaborative resources for data-intensive science. Implementing a system with these characteristics exposed challenges in user interface, data distribution, and workflow description/execution. We describe ongoing responses to these and other challenges. Our Data-Action-Queue design pattern addresses user interface and system organization concepts. A dynamic data distribution mechanism lays the foundation for the management of persistent datasets. Makeflow facilitates the simple description and execution of complex multi-part jobs and forms the kernel of a module system powering diverse bioinformatics applications. Our improved web portal, Biocompute 2.0, has been in production use since the summer of 2010. Through it and its predecessor, we have provided over 56 years of CPU time through its five modules—BLAST, SSAHA, SHRIMP, BWA, and SNPEXP—to research groups at three universities. In this paper, we describe the goals and interface to the system, its architecture and performance, and the insights gained in its development. Copyright © 2011 John Wiley & Sons, Ltd.