ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Computational Biology and Chemistry
Bioinformatics
Bioinformatics
Developing Scientific Applications with Loosely-Coupled Sub-tasks
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
The Sequence Alignment/Map format and SAMtools
Bioinformatics
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Exploring the RNA folding energy landscape using scalable distributed cyberinfrastructure
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Workflow as a service: an approach to workflow farming
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Hi-index | 0.00 |
Next Generation DNA Sequencing platforms produce significantly larger amounts of data compared to early Sanger technology sequencers. In addition to the challenges of data-management that arise from unprecedented volumes of data, there exists the important requirement of effectively analyzing the data. In this paper, we use BFAST -- genome-wide mapping application, as a representative example of the typical analysis that is required on data from NGS machines. We investigate two model genomes -- human genome and a microbe (Burkerholderia Glumae), that represent an eukaryotic and a prokaryotic system. The computational complexity of genome-wide mapping using BFAST, amongst other factors depends upon the size of a reference genome, the data size of short reads. We analyze the performance characteristics of BFAST and understand its dependency on different input parameters. Characterizing the performance suggests that genome-wide mapping benefits from both scaling-up (increased fine-grained parallelism) and scaling-out (task-level parallelism -- local and distributed). For certain problem instances, scaling-out can be a more efficient approach than scaling-up. We then design, develop and demonstrate a runtime-environment that supports both the scale-up and scale-out of BFAST on production grid and cloud environments.