ReAligner: a program for refining DNA sequence multi-alignments
RECOMB '97 Proceedings of the first annual international conference on Computational molecular biology
Algorithms for optimizing production DNA sequencing
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Separating repeats in DNA sequence assembly
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
De novo repeat classification and fragment assembly
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Hi-index | 0.00 |
Accurate base-assignment in repeat regions of a whole genome shotgun assembly is an unsolved problem. Since reads in repeat regions cannot be easily attributed to a unique location in the genome, current assemblers may place these reads arbitrarily. As a result, the base-assignment error rate in repeats is likely to be much higher than that in the rest of the genome. We developed an iterative algorithm, EULER-AIR, that is able to correct base-assignment errors in finished genome sequences in public databases. The Wolbachia genome is among the best finished genomes. Using this genome project as an example, we demonstrated that EULER-AIR can 1) discover and correct base-assignment errors, 2) provide accurate read assignments, 3) utilize finishing reads for accurate base-assignment, and 4) provide guidance for designing finishing experiments. In the genome of Wolbachia, EULER-AIR found 16 positions with ambiguous base-assignment and two positions with erroneous bases. Besides Wolbachia, many other genome sequencing projects have significantly fewer finishing reads and, hence, are likely to contain more base-assignment errors in repeats. We demonstrate that EULER-AIR is a software tool that can be used to find and correct base-assignment errors in a genome assembly project.