Computability of models for sequence assembly

Authors:
Paul Medvedev;Konstantinos Georgiou;Gene Myers;Michael Brudno
Affiliations:
University of Toronto, Canada;University of Toronto, Canada;Janelia Farms, Howard Hughes Medical Institute;University of Toronto, Canada
Venue:
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Year:
2007

Citing 5
Cited 5

Exact and approximation algorithms for DNA sequence reconstruction

Exact and approximation algorithms for DNA sequence reconstruction
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
De novo repeat classification and fragment assembly

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
The fragment assembly string graph

Bioinformatics

Ab initio whole genome shotgun assembly with mated short reads

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
An efficient algorithm for Chinese postman walk on bi-directed de Bruijn graphs

COCOA'10 Proceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part I
Hapsembler: an assembler for highly polymorphic genomes

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
An integer programming approach to DNA sequence assembly

Computational Biology and Chemistry

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.