Biocompute: towards a collaborative workspace for data intensive bio-science

  • Authors:
  • Rory Carmichael;Patrick Braga-Henebry;Douglas Thain;Scott Emrich

  • Affiliations:
  • University of Notre Dame, Notre Dame, IN;IMC Financial Markets, Chicago, IL;University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN

  • Venue:
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The explosion of data in the biological community demands the development of more scalable and flexible portals for bioinformatic computation. To address this need, we put forth characteristics needed for rigorous, reproducible, and collaborative resources for data intensive science. Implementing a system with these characteristics exposed challenges in user interface, data distribution, and workflow description/execution. We describe several responses to these challenges. The Data-Action-Queue metaphor addresses user interface and system organization concepts. A dynamic data distribution mechanism lays the foundation for the management of persistent datasets. The Makeflow workflow facilitates the simple description and execution of complex multipart jobs. The resulting web portal, Biocompute, has been in production use at the University of Notre Dame's Bioinformatics Core Facility since the summer of 2009. It has provided over seven years of CPU time through its three sequence search modules --- BLAST, SSAHA, and SHRIMP --- to ten biological and bioinformatic research groups spanning three universities. In this paper we describe the goals and interface to the system, its architecture and performance, and the insights gained in its development.