Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Efficient Storage and Retrieval by Content and Address of Static Files
Journal of the ACM (JACM)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct Representation of Balanced Parentheses and Static Trees
SIAM Journal on Computing
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array
ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Dynamic dictionary matching and compressed suffix trees
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Ultra-succinct representation of ordered trees
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Compressed Index for Dictionary Matching
DCC '08 Proceedings of the Data Compression Conference
Succinct Text Indexing with Wildcards
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Succinct Index for Dynamic Dictionary Matching
ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Construction of aho corasick automaton in linear time for integer alphabets
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Faster compressed dictionary matching
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Worst case efficient single and multiple string matching in the RAM model
IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Succincter text indexing with wildcards
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Succinct 2D dictionary matching with no slowdown
WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Compressed text indexing with wildcards
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Succinct indexes for circular patterns
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model
Journal of Discrete Algorithms
Efficient algorithm for circular burrows-wheeler transform
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Faster compressed dictionary matching
Theoretical Computer Science
Compressed text indexing with wildcards
Journal of Discrete Algorithms
Compressed indexes for text with wildcards
Theoretical Computer Science
Compressed automata for dictionary matching
CIAA'13 Proceedings of the 18th international conference on Implementation and Application of Automata
Hi-index | 0.00 |
The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size σ, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a representation that occupies O(m log m) bits of space where m ≤ n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log σ + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T|+ occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses O(n log σ) bits of space while answering queries in O(|T| log log n + occ) time. In the paper we also show how the space occupancy can be reduced to m(H0+O(1))+O(d log(n/d)) where H0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that σ mε for any constant 0