Fast and Sensitive Probe Selection for DNA Chips Using Jumps in Matching Statistics
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Hi-index | 0.00 |
Let $\Sigma$ be a finite alphabet with $C$ letters. For any two strings $x$ and $y$ of length $n$, we let $S(x,y)$ denote the size of the longest common consecutive substring between $x$ and $y$; that is, $S(x,y)$ is the largest $k$ such that, $$ x_i \cdots x_{i+k} = y_j \cdots y_{j+k}$$ for some $i$ and $j$. We show that for $x$ and $y$ chosen uniformly among all possible strings of length $n$, $S(x,y)$ is highly concentrated around $2 \log_C n$. More precisely, for any $a \geq 1$ $$ \Pr [ |S(x,y) - 2 \log_C n |