Exact and approximation algorithms for DNA sequence reconstruction
Exact and approximation algorithms for DNA sequence reconstruction
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
De novo repeat classification and fragment assembly
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
The fragment assembly string graph
Bioinformatics
Ab initio whole genome shotgun assembly with mated short reads
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
An efficient algorithm for Chinese postman walk on bi-directed de Bruijn graphs
COCOA'10 Proceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part I
Hapsembler: an assembler for highly polymorphic genomes
RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
An integer programming approach to DNA sequence assembly
Computational Biology and Chemistry
Hi-index | 0.00 |
Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.