Special Issue: The First Provenance Challenge
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and efficient storage of e-Science experiment provenance
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Tracking provenance in a virtual data grid
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Adaptive, secure, and scalable distributed data outsourcing: a vision paper
Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems
The topology aware file distribution problem
COCOON'11 Proceedings of the 17th annual international conference on Computing and combinatorics
The topology aware file distribution problem
Journal of Combinatorial Optimization
Hi-index | 0.00 |
The explosion of data in the biological community demands the development of more scalable and flexible portals for bioinformatic computation. To address this need, we put forth characteristics needed for rigorous, reproducible, and collaborative resources for data intensive science. Implementing a system with these characteristics exposed challenges in user interface, data distribution, and workflow description/execution. We describe several responses to these challenges. The Data-Action-Queue metaphor addresses user interface and system organization concepts. A dynamic data distribution mechanism lays the foundation for the management of persistent datasets. The Makeflow workflow facilitates the simple description and execution of complex multipart jobs. The resulting web portal, Biocompute, has been in production use at the University of Notre Dame's Bioinformatics Core Facility since the summer of 2009. It has provided over seven years of CPU time through its three sequence search modules --- BLAST, SSAHA, and SHRIMP --- to ten biological and bioinformatic research groups spanning three universities. In this paper we describe the goals and interface to the system, its architecture and performance, and the insights gained in its development.