Computing Highly Specific and Mismatch Tolerant Oligomers Efficiently

  • Authors:
  • Tomoyuki Yamada;Shinichi Morishita

  • Affiliations:
  • -;-

  • Venue:
  • CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The sequencing of the genomes of a variety of speciesand the growing databases containing expressed sequencetags (ESTs) and complementary DNAs (cDNAs) facilitatethe design of highly specific oligomers for use as genomicmarkers, PCR primers, or DNA oligo microarrays. Thefirst step in evaluating the specificity of short oligomers ofabout twenty units in length is to determine the frequenciesat which the oligomers occur. However, for oligomerslonger than about fifty units this is not efficient, as they usuallyhave a frequency of only 1. A more suitable procedureis to consider the mismatch tolerance of an oligomer,that is, the minimum number of mismatches that allows agiven oligomer to match a sub-sequence other than the targetsequence anywhere in the genome or the EST database.However, calculating the exact value of mismatch toleranceis computationally costly and impractical. Therefore, westudied the problem of checking whether an oligomer meetsthe constraint that its mismatch tolerance is no less than agiven threshold. Here, we present an efficient dynamic programmingalgorithm solution that utilizes suffix and heightarrays. We demonstrated the effectiveness of this algorithmby efficiently computing a dense list of oligo-markers applicableto the human genome. Experimental results show thatthe algorithm runs faster than well-known Abrahamson'salgorithm by orders of magnitude and is able to enumerate63% ~ 79% of qualified oligomers.