Compressed automata for dictionary matching

Authors:
I Tomohiro;Takaaki Nishimoto;Shunsuke Inenaga;Hideo Bannai;Masayuki Takeda
Affiliations:
Department of Informatics, Kyushu University, Japan,Japan Society for the Promotion of Science (JSPS), Japan;Department of Informatics, Kyushu University, Japan;Department of Informatics, Kyushu University, Japan;Department of Informatics, Kyushu University, Japan;Department of Informatics, Kyushu University, Japan
Venue:
CIAA'13 Proceedings of the 18th international conference on Implementation and Application of Automata
Year:
2013

Citing 14
Cited 0

Text algorithms

Text algorithms
Data compression via textual substitution

Journal of the ACM (JACM)
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract)

SWAT '96 Proceedings of the 5th Scandinavian Workshop on Algorithm Theory
An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Collage system: a unifying framework for compressed pattern matching

Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
Offline Dictionary-Based Compression

DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
A Technique for High-Performance Data Compression

Computer
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Succinct dictionary matching with no slowdown

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Random access to grammar-compressed strings

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

A variant of the dictionary matching problem is addressed where the dictionary is given in an SLP-compressed form. An Aho-Corasick automata-based algorithm is presented which pre-processes the compressed dictionary $\mathcal{D}$ in O(n4logn) time using O(n2logN) space and recognizes all occurrences of the patterns in $\mathcal{D}$ in amortized O(h+m) running time per character, where n and N are, respectively, the compressed and uncompressed sizes of $\mathcal{D}$, and h is the height of $\mathcal{D}$, and m is the number of patterns in the dictionary.