Faster pattern matching with character classes using prime number encoding

Authors:
Chaim Linhart;Ron Shamir
Affiliations:
School of Computer Science, Tel Aviv University, Tel-Aviv 69978, Israel;School of Computer Science, Tel Aviv University, Tel-Aviv 69978, Israel
Venue:
Journal of Computer and System Sciences
Year:
2009

Citing 11
Cited 5

Generalized string matching

SIAM Journal on Computing
A new approach to text searching

Communications of the ACM
Tree pattern matching and subset matching in randomized O(nlog3m) time

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Faster algorithms for string matching with k mismatches

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A fast string searching algorithm

Communications of the ACM
Verifying candidate matches in sparse and wildcard matching

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Efficient pattern-matching with don't cares

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Introduction to Algorithms

Introduction to Algorithms
Faster Algorithms for String Matching Problems: Matching the Convolution Bound

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Simple deterministic wildcard matching

Information Processing Letters

Generalised Matching

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Pattern matching with wildcards using words of shorter length

Information Processing Letters
Space lower bounds for online pattern matching

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Exploiting word-level parallelism for fast convolutions and their applications in approximate string matching

European Journal of Combinatorics
Space lower bounds for online pattern matching

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In pattern matching with character classes the goal is to find all occurrences of a pattern of length m in a text of length n, where each pattern position consists of an allowed set of characters from a finite alphabet @S. We present an FFT-based algorithm that uses a novel prime-numbers encoding scheme, which is logn/logm times faster than the fastest extant approaches, which are based on boolean convolutions. In particular, if m^|^@S^|=n^O^(^1^), our algorithm runs in time O(nlogm), matching the complexity of the fastest techniques for wildcard matching, a special case of our problem. A major advantage of our algorithm is that it allows a tradeoff between the running time and the RAM word size. Our algorithm also speeds up solutions to approximate matching with character classes problems-namely, matching with k mismatches and Hamming distance, as well as to the subset matching problem.