Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
The Enhanced Suffix Array and Its Applications to Genome Analysis
WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Combinatorics of Periods in Strings
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Rapid Large-Scale Oligonucleotide Selection for Microarrays
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Longest Common Consecutive Substring in Two Random Strings
Longest Common Consecutive Substring in Two Random Strings
On the Distribution of the Number of Missing Words in Random Texts
Combinatorics, Probability and Computing
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Integer linear programming approaches for non-unique probe selection
Discrete Applied Mathematics
Note: On the complexity of non-unique probe selection
Theoretical Computer Science
An efficient algorithm for finding gene-specific probes for DNA microarrays
ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
A fast preprocessing algorithm to select gene-specific probes of DNA microarrays
FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Computing matching statistics and maximal exact matches on compressed full-text indexes
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Bidirectional search in a string with wavelet trees and bidirectional matching statistics
Information and Computation
Hi-index | 0.00 |
The design of large scale DNA microarrays is a challengingproblem. So far, probe selection algorithms must tradethe ability to cope with large scale problems for a loss ofaccuracy in the estimation of probe quality. We present anapproach based on jumps in matching statistics that combinesthe best of both worlds.This article consists of two parts. The first part is theoretical.We introduce the notion of jumps in matchingstatistics between two strings and derive their properties.We estimate the frequency of jumps for random strings ina non-uniform Bernoulli model and present a new heuristicargument to find the center of the length distribution of thelongest substring that two random strings have in common.The results are generalized to near-perfect matches with asmall number of mismatches.In the second part, we use the concept of jumps to improvethe accuracy of the longest common factor approachfor probe selection by moving from a string-based to anenergy-based specificity measure, while only slightly morethan doubling the selection time.