An efficient hybrid approach to correcting errors in short reads

  • Authors:
  • Zhiheng Zhao;Jianping Yin;Yong Li;Wei Xiong;Yubin Zhan

  • Affiliations:
  • School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China

  • Venue:
  • MDAI'11 Proceedings of the 8th international conference on Modeling decisions for artificial intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-throughput sequencing technologies produce a large number of short reads that may contain errors. These sequencing errors constitute one of the major problems in analyzing such data. Many algorithms and software tools have been proposed to correct errors in short reads. However, the computational complexity limits their performance. In this paper, we propose a novel and efficient hybrid approach which is based on an alignment-free method combined with multiple alignments. We construct suffix arrays on all short reads to search the correct overlapping regions. For each correct overlapping region, we form multiple alignments for the substrings following the correct overlapping region to identify and correct the erroneous bases. Our approach can correct all types of errors in short reads produced by different sequencing platforms. Experiments show that our approach provides significantly higher accuracy and is comparable or even faster than previous approaches.