Set Intersection and Sequence Matching

Authors:
Ariel Shiftan;Ely Porat
Affiliations:
Department of Computer Science, Bar-Ilan University, Ramal Gab, Israel 52900;Department of Computer Science, Bar-Ilan University, Ramal Gab, Israel 52900
Venue:
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Year:
2009

Citing 9
Cited 0

Generalized string matching

SIAM Journal on Computing
Tree pattern matching and subset matching in deterministic O(n log3 n)-time

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
A fast string searching algorithm

Communications of the ACM
Approximate subset matching with Don't Cares

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Verifying candidate matches in sparse and wildcard matching

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
STRING-MATCHING AND OTHER PRODUCTS

STRING-MATCHING AND OTHER PRODUCTS
Simple deterministic wildcard matching

Information Processing Letters
Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
From coding theory to efficient pattern matching

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the classical pattern matching problem, one is given a text and a pattern, both of which are sequences of letters, and is required to find all occurrences of the pattern in the text. We study two modifications of the classical problem, where each letter in the text and pattern is a set (Set Intersection Matching problem) or a sequence (Sequence Matching problem). Two "letters" are considered to be match if the intersection of the two corresponding sets is not empty, or if the two sequences have a common element in the same index. We show the first known non-trivial and efficient algorithms for these problems, for the case the maximum set/sequence size is small. The first, randomized, that takes $\Theta\left( 2^dn\ln n\log m\right)$ time, where d is the maximum set/sequence size, and can also fit, with slight modifications, for the case one is also interested in up to k mismatches. The second is deterministic and takes $\Theta\left( 4^{d}n\log m\right)$. The third algorithm, also deterministic, is able to count the number of matches at each index of the text in total running time $\Theta\left( \sum_{i=1}^{d} {|\Sigma| \choose i} n\log m \right)$.