Faster compressed dictionary matching

Authors:
Wing-Kai Hon;Tsung-Han Ku;Rahul Shah;Sharma V. Thankachan;Jeffrey Scott Vitter
Affiliations:
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan;Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan;Department of Computer Science, Louisiana State University, Baton Rouge, LA, USA;Department of Computer Science, Louisiana State University, Baton Rouge, LA, USA;Department of Computer Science, The University of Kansas, Lawrence, KS, USA
Venue:
Theoretical Computer Science
Year:
2013

Citing 14
Cited 0

A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Indexing compressed text

Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Compressed indexes for dynamic text collections

ACM Transactions on Algorithms (TALG)
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

ACM Transactions on Algorithms (TALG)
On searching compressed string collections cache-obliviously

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Compressed Index for Dictionary Matching

DCC '08 Proceedings of the Data Compression Conference
Compressing and indexing labeled trees, with applications

Journal of the ACM (JACM)
Succinct Text Indexing with Wildcards

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Succinct dictionary matching with no slowdown

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
A framework for dynamizing succinct data structures

ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming

Quantified Score

Hi-index	5.23

Visualization

Abstract

Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size @s, Hon et al. (2008) [12] gave an nH"k(D)+o(nlog@s)-bit index which supports a query in O(|T|(log^@en+logd)+occ) time, where @e0 and H"k(D) denotes the kth-order entropy of D. Very recently, Belazzougui (2010) [3] has proposed an elegant scheme, which takes nlog@s+O(n) bits of index space and supports a query in optimal O(|T|+occ) time. In this paper, we provide connections between Belazzougui's index and the XBW compression of Ferragina and Manzini (2005) [8], and show that Belazzougui's index can be slightly modified to be stored in nH"k(D)+O(n) bits, while query time remains optimal; this improves the compressed index by Hon et al. (2008) [12] in both space and time.