Fast and Sensitive Probe Selection for DNA Chips Using Jumps in Matching Statistics

Authors:
Sven Rahmann
Affiliations:
-
Venue:
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Year:
2003

Citing 7
Cited 6

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The Enhanced Suffix Array and Its Applications to Genome Analysis

WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Combinatorics of Periods in Strings

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Rapid Large-Scale Oligonucleotide Selection for Microarrays

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Longest Common Consecutive Substring in Two Random Strings

Longest Common Consecutive Substring in Two Random Strings
On the Distribution of the Number of Missing Words in Random Texts

Combinatorics, Probability and Computing
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)

Integer linear programming approaches for non-unique probe selection

Discrete Applied Mathematics
Note: On the complexity of non-unique probe selection

Theoretical Computer Science
An efficient algorithm for finding gene-specific probes for DNA microarrays

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
A fast preprocessing algorithm to select gene-specific probes of DNA microarrays

FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Computing matching statistics and maximal exact matches on compressed full-text indexes

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Bidirectional search in a string with wavelet trees and bidirectional matching statistics

Information and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The design of large scale DNA microarrays is a challengingproblem. So far, probe selection algorithms must tradethe ability to cope with large scale problems for a loss ofaccuracy in the estimation of probe quality. We present anapproach based on jumps in matching statistics that combinesthe best of both worlds.This article consists of two parts. The first part is theoretical.We introduce the notion of jumps in matchingstatistics between two strings and derive their properties.We estimate the frequency of jumps for random strings ina non-uniform Bernoulli model and present a new heuristicargument to find the center of the length distribution of thelongest substring that two random strings have in common.The results are generalized to near-perfect matches with asmall number of mismatches.In the second part, we use the concept of jumps to improvethe accuracy of the longest common factor approachfor probe selection by moving from a string-based to anenergy-based specificity measure, while only slightly morethan doubling the selection time.