Sequencing by hybridization with errors: handling longer sequences

  • Authors:
  • Dekel Tsur

  • Affiliations:
  • Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2005

Quantified Score

Hi-index 5.23

Visualization

Abstract

Sequencing by hybridization (SBH) is a method for reconstructing a DNA sequence given the set of all subsequences of length k of the target sequence. This set, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. However, the hybridization experiments are error prone, so this leads to the computational problem of reconstructing a sequence from a noisy spectrum. Halperin et al. gave an algorithm for this problem with provable performance in the presence of both false positive and false negative errors. Assuming, for example, that the false positive rate is small, and the probability of false negative is 0.1, the algorithm can reconstruct a random sequence of length O(20.7k) with an arbitrary small probability of failure. In this paper, we give an algorithm that can reconstruct longer sequences: under the assumptions above, our algorithm can reconstruct sequences of length O(20.942k). This bound is almost optimal as the bound for the errorless case is Θ(2k).