Worst-case efficient single and multiple string matching on packed texts in the word-RAM model

Authors:
Djamal Belazzougui
Affiliations:
LIAFA, Univ. Paris Diderot, Paris 7, 75205 Paris Cedex 13, France
Venue:
Journal of Discrete Algorithms
Year:
2012

Citing 23
Cited 0

Storing a Sparse Table with 0(1) Worst Case Access Time

Journal of the ACM (JACM)
Filtering search: a new approach to query answering

SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Surpassing the information theoretic bound with fusion trees

Journal of Computer and System Sciences - Special issue: papers from the 22nd ACM symposium on the theory of computing, May 14–16, 1990
Text algorithms

Text algorithms
Sorting in linear time?

Journal of Computer and System Sciences
The string B-tree: a new data structure for string search in external memory and its applications

Journal of the ACM (JACM)
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Faster String Matching with Super-Alphabets

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Efficient Minimal Perfect Hashing in Nearly Minimal Space

STACS '01 Proceedings of the 18th Annual Symposium on Theoretical Aspects of Computer Science
A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
Polynomial Hash Functions Are Reliable (Extended Abstract)

ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Finding Maximal Repetitions in a Word in Linear Time

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Indexing text using the Ziv-Lempel trie

Journal of Discrete Algorithms - SPIRE 2002
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Algorithms on Strings

Algorithms on Strings
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing

DCC '08 Proceedings of the Data Compression Conference
(Data) STRUCTURES

FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Fast Searching in Packed Strings

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Succinct Text Indexing with Wildcards

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
MPSCAN: fast localisation of multiple reads in genomes

WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Succinct dictionary matching with no slowdown

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we explore worst-case solutions for the problems of single and multiple matching on strings in the word-RAM model with word length w. In the first problem, we have to build a data structure based on a pattern p of length m over an alphabet of size @s such that we can answer to the following query: given a text T of length n, where each character is encoded using log@s bits return the positions of all the occurrences of p in T (in the following we refer by occ to the number of reported occurrences). For the multi-pattern matching problem we have a set S of d patterns of total length m and a query on a text T consists in finding all positions of all occurrences in T of the patterns in S. As each character of the text is encoded using log@s bits and we can read w bits in constant time in the RAM model, we assume that we can read up to @Q(w/log@s) consecutive characters of the text in one time step. This implies that the fastest possible query time for both problems is O(nlog@sw+occ). In this paper we present several different results for both problems which come close to that best possible query time. We first present two different linear space data structures for the first and second problem: the first one answers to single pattern matching queries in time O(n(1m+log@sw)+occ) while the second one answers to multiple pattern matching queries to O(n(logd+logy+loglogmy+log@sw)+occ) where y is the length of the shortest pattern. We then show how a simple application of the four Russian technique permits to get data structures with query times independent of the length of the shortest pattern (the length of the only pattern in case of single string matching) at the expense of using more space.