Approximate matching in weighted sequences

Authors:
Amihood Amir;Costas Iliopoulos;Oren Kapah;Ely Porat
Affiliations:
Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel and College of Computing, Georgia Tech, Atlanta, GA;Department of Computer Science, King's College London, Strand, London, United Kingdom;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel
Venue:
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Year:
2006

Citing 7
Cited 5

Introduction to algorithms

Introduction to algorithms
Faster tree pattern matching

Journal of the ACM (JACM)
Efficient 2-dimensional approximate matching of half-rectangular figures

Information and Computation
Tree pattern matching and subset matching in randomized O(nlog3m) time

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Pattern matching in hypertext

Journal of Algorithms
Verifying candidate matches in sparse and wildcard matching

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Efficient tree pattern matching

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science

The Practical Efficiency of Convolutions in Pattern Matching Algorithms

Fundamenta Informaticae - Workshop on Combinatorial Algorithms
Weighted LCS

Journal of Discrete Algorithms
Polynomial-time approximation algorithms for weighted LCS problem

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Weighted shortest common supersequence

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
The Practical Efficiency of Convolutions in Pattern Matching Algorithms

Fundamenta Informaticae - Workshop on Combinatorial Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Weighted sequences have been recently introduced as a tool to handle a set of sequences that are not identical but have many local similarities. The weighted sequence is a “statistical image” of this set, where the probability of every symbol's occurrence at every text location is given. We address the problem of approximately matching a pattern in such a weighted sequence. The pattern is a given string and we seek all locations in the set where the pattern occurs with a high enough probability. We define the notion of Hamming distance and edit distance in weighted sequences and give efficient algorithms for computing them. We compute two versions of the Hamming distance in time $O(n \sqrt{m\log m})$, where n is the length of the weighted text and m is the pattern length. The edit distance is computed in time O(nm) and O(nm2), depending on the edit distance definition used. Unfortunately, due to space considerations, the edit distance details are left to the journal version. We also define the notion of weighted matching in infinite alphabets and show that exact weighted matching can be computed in time O(slog2s), where s is the number of text symbols having non-zero probability. The weighted Hamming distance over infinite alphabets can be computed in time $\min(O(kn\sqrt{s}+s^{3/2}\log^2s), O(s^{4/3}m^{1/3}\log s))$.