Algorithm: Dealing with repetitions in sequencing by hybridization

  • Authors:
  • Jacek Blazewicz;Fred Glover;Marta Kasprzak;Wojciech T. Markiewicz;Ceyda Ouz;Dietrich Rebholz-Schuhmann;Aleksandra Swiercz

  • Affiliations:
  • Institute of Computing Science, Poznań University of Technology, Piotrowo 2, 60-965 Poznań, Poland and Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12, 61- ...;University of Colorado, Boulder, CO 80309-0419, USA;Institute of Computing Science, Poznań University of Technology, Piotrowo 2, 60-965 Poznań, Poland and Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12, 61- ...;Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12, 61-704 Poznań, Poland;Department of Industrial Engineering, Koç University, Rumeli Feneri Yolu, 34450 Sariyer, Istanbul, Turkey;European Bioinformatics Institute, Cambridge, UK;Institute of Computing Science, Poznań University of Technology, Piotrowo 2, 60-965 Poznań, Poland and Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12, 61- ...

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

DNA sequencing by hybridization (SBH) induces errors in the biochemical experiment. Some of them are random and disappear when the experiment is repeated. Others are systematic, involving repetitions in the probes of the target sequence. A good method for solving SBH problems must deal with both types of errors. In this work we propose a new hybrid genetic algorithm for isothermic and standard sequencing that incorporates the concept of structured combinations. The algorithm is then compared with other methods designed for handling errors that arise in standard and isothermic SBH approaches. DNA sequences used for testing are taken from GenBank. The set of instances for testing was divided into two groups. The first group consisted of sequences containing positive and negative errors in the spectrum, at a rate of up to 20%, excluding errors coming from repetitions. The second group consisted of sequences containing repeated oligonucleotides, and containing additional errors up to 5% added into the spectra. Our new method outperforms the best alternative procedures for both data sets. Moreover, the method produces solutions exhibiting extremely high degree of similarity to the target sequences in the cases without repetitions, which is an important outcome for biologists. The spectra prepared from the sequences taken from GenBank are available on our website http://bio.cs.put.poznan.pl/.