Exact and approximation algorithms for DNA sequence reconstruction
Exact and approximation algorithms for DNA sequence reconstruction
The fragment assembly string graph
Bioinformatics
Ab initio whole genome shotgun assembly with mated short reads
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Computability of models for sequence assembly
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly
RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
Complexity Issues in Computational Biology
Fundamenta Informaticae - Watching the Daisies Grow: from Biology to Biomathematics and Bioinformatics — Alan Turing Centenary Special Issue
Hi-index | 0.00 |
The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated error-free data, we argue that this can effectively improve the contig sizes in assembly.