Succinct dictionary matching with no slowdown

Authors:
Djamal Belazzougui
Affiliations:
LIAFA, Univ. Paris Diderot-Paris 7, Paris Cedex 13, France
Venue:
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Year:
2010

Citing 15
Cited 13

Efficient suffix trees on secondary storage

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Efficient Storage and Retrieval by Content and Address of Static Files

Journal of the ACM (JACM)
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct Representation of Balanced Parentheses and Static Trees

SIAM Journal on Computing
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Dynamic dictionary matching and compressed suffix trees

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Ultra-succinct representation of ordered trees

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Compressed Index for Dictionary Matching

DCC '08 Proceedings of the Data Compression Conference
Succinct Text Indexing with Wildcards

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Succinct Index for Dynamic Dictionary Matching

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Construction of aho corasick automaton in linear time for integer alphabets

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Compression, indexing, and retrieval for massive string data

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Faster compressed dictionary matching

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Worst case efficient single and multiple string matching in the RAM model

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Succincter text indexing with wildcards

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Succinct 2D dictionary matching with no slowdown

WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Compressed text indexing with wildcards

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Succinct indexes for circular patterns

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model

Journal of Discrete Algorithms
Efficient algorithm for circular burrows-wheeler transform

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Faster compressed dictionary matching

Theoretical Computer Science
Compressed text indexing with wildcards

Journal of Discrete Algorithms
Compressed indexes for text with wildcards

Theoretical Computer Science
Compressed automata for dictionary matching

CIAA'13 Proceedings of the 18th international conference on Implementation and Application of Automata

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size σ, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a representation that occupies O(m log m) bits of space where m ≤ n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log σ + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T|+ occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses O(n log σ) bits of space while answering queries in O(|T| log log n + occ) time. In the paper we also show how the space occupancy can be reduced to m(H0+O(1))+O(d log(n/d)) where H0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that σ mε for any constant 0