Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
A new approach to fragment assembly in DNA sequencing
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Fragment assembly with short reads
Bioinformatics
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Bioinformatics
Correction of sequencing errors in a mixed set of reads
Bioinformatics
Bioinformatics
Bioinformatics
Hi-index | 0.00 |
High-throughput sequencing technologies produce a large number of short reads that may contain errors. These sequencing errors constitute one of the major problems in analyzing such data. Many algorithms and software tools have been proposed to correct errors in short reads. However, the computational complexity limits their performance. In this paper, we propose a novel and efficient hybrid approach which is based on an alignment-free method combined with multiple alignments. We construct suffix arrays on all short reads to search the correct overlapping regions. For each correct overlapping region, we form multiple alignments for the substrings following the correct overlapping region to identify and correct the erroneous bases. Our approach can correct all types of errors in short reads produced by different sequencing platforms. Experiments show that our approach provides significantly higher accuracy and is comparable or even faster than previous approaches.