Composite pattern discovery for PCR application

Authors:
Stanislav Angelov;Shunsuke Inenaga
Affiliations:
Department of Computer and Information Science, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA;Department of Informatics, Kyushu University, Fukuoka, Japan
Venue:
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Year:
2005

Citing 9
Cited 1

Searching subsequences

Theoretical Computer Science
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Discovering Best Variable-Length-Don't-Care Patterns

DS '02 Proceedings of the 5th International Conference on Discovery Science
Finding Best Patterns Practically

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Efficient Discovery of Proximity Patterns with Suffix Arrays

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Practical Algorithm to Find the Best Subsequence Patterns

DS '00 Proceedings of the Third International Conference on Discovery Science
A Practical Algorithm to Find the Best Episode Patterns

DS '01 Proceedings of the 4th International Conference on Discovery Science
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing
An O(N^2) Algorithm for Discovering Optimal Boolean Pattern Pairs

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Missing pattern discovery

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of finding pairs of short patterns such that, in a given input sequence of length n, the distance between each pair's patterns is at least α. The problem was introduced in [1]and is motivated by the optimization of multiplexed nested PCR. We study algorithms for the following two cases; the special case when the two patterns in the pair are required to have the same length, and the more general case when the patterns can have different lengths. For the first case we present an O(αn log log n) time and O(n) space algorithm, and for the general case we give an O(αn log n) time and O(n) space algorithm. The algorithms work for any alphabet size and use asymptotically less space than the algorithms presented in [1]. For alphabets of constant size we also give an $O(n\sqrt{n} {\rm log}^{2} n)$ time algorithm for the general case. We demonstrate that the algorithms perform well in practice and present our findings for the human genome. In addition, we study an extended version of the problem where patterns in the pair occur at certain positions at a distance at most α, but do not occur α-close anywhere else, in the input sequence.