A Comprehensive Analysis Workflow for Genome-Wide Screening Data from ChIP-Sequencing Experiments
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
A moldable online scheduling algorithm and its application to parallel short sequence mapping
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Optimizing the stretch of independent tasks on a cluster: From sequential tasks to moldable tasks
Journal of Parallel and Distributed Computing
Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Hi-index | 0.00 |
With the advent of next-generation high throughput sequencing instruments, large volumes of short sequence data are generated at an unprecedented rate. Processing and analyzing these massive data requires overcoming several challenges including mapping of generated short sequences to a reference genome. This computationally intensive process takes time on the order of days using existing sequential techniques on large scale datasets. In this work, we propose six parallelization methods to speedup short sequence mapping and to reduce the execution time under just a few hours for such large datasets. We comparatively present these methods and give theoretical cost models for each method. Experimental results on real datasets demonstrate the effectiveness of the parallel methods and indicate that the cost models help accurate estimation of parallel execution time. Based on these cost models we implemented a selection function to predict the best method for a given scenario. To the best of our knowledge this is the first study on parallelization of short sequence mapping problem.