Ab initio whole genome shotgun assembly with mated short reads

Authors:
Paul Medvedev;Michael Brudno
Affiliations:
Department of Computer Science, University of Toronto, Canada;Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada
Venue:
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Year:
2008

Citing 12
Cited 3

Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
Exact and approximation algorithms for DNA sequence reconstruction

Exact and approximation algorithms for DNA sequence reconstruction
An efficient implementation of a scaling minimum-cost flow algorithm

Journal of Algorithms
An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
De novo repeat classification and fragment assembly

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Fragment assembly with short reads

Bioinformatics
A bidirected generalization of network matrices

Networks
The fragment assembly string graph

Bioinformatics
Assembling millions of short DNA sequences using SSAKE

Bioinformatics
Extending assembly of short DNA sequences to handle error

Bioinformatics
Monotonizing linear programs with up to two nonzeroes per column

Operations Research Letters
Computability of models for sequence assembly

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics

Assembly of Large Genomes from Paired Short Reads

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology

Quantified Score

Hi-index	0.01

Visualization

Abstract

Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.