Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Exact and approximation algorithms for DNA sequence reconstruction
Exact and approximation algorithms for DNA sequence reconstruction
An efficient implementation of a scaling minimum-cost flow algorithm
Journal of Algorithms
STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
De novo repeat classification and fragment assembly
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Fragment assembly with short reads
Bioinformatics
The fragment assembly string graph
Bioinformatics
Assembling millions of short DNA sequences using SSAKE
Bioinformatics
Extending assembly of short DNA sequences to handle error
Bioinformatics
Monotonizing linear programs with up to two nonzeroes per column
Operations Research Letters
Computability of models for sequence assembly
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Assembly of Large Genomes from Paired Short Reads
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Hi-index | 0.01 |
Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.