SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Approximate nearest neighbors and sequence comparison with block operations
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Faster algorithms for string matching with k mismatches
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Simple and Practical Sequence Nearest Neighbors with Block Operations
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Rapid Large-Scale Oligonucleotide Selection for Microarrays
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
An efficient algorithm for finding similar short substrings from large scale string data
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hi-index | 0.00 |
The sequencing of the genomes of a variety of speciesand the growing databases containing expressed sequencetags (ESTs) and complementary DNAs (cDNAs) facilitatethe design of highly specific oligomers for use as genomicmarkers, PCR primers, or DNA oligo microarrays. Thefirst step in evaluating the specificity of short oligomers ofabout twenty units in length is to determine the frequenciesat which the oligomers occur. However, for oligomerslonger than about fifty units this is not efficient, as they usuallyhave a frequency of only 1. A more suitable procedureis to consider the mismatch tolerance of an oligomer,that is, the minimum number of mismatches that allows agiven oligomer to match a sub-sequence other than the targetsequence anywhere in the genome or the EST database.However, calculating the exact value of mismatch toleranceis computationally costly and impractical. Therefore, westudied the problem of checking whether an oligomer meetsthe constraint that its mismatch tolerance is no less than agiven threshold. Here, we present an efficient dynamic programmingalgorithm solution that utilizes suffix and heightarrays. We demonstrated the effectiveness of this algorithmby efficiently computing a dense list of oligo-markers applicableto the human genome. Experimental results show thatthe algorithm runs faster than well-known Abrahamson'salgorithm by orders of magnitude and is able to enumerate63% ~ 79% of qualified oligomers.