Faster compressed dictionary matching

Authors:
Wing-Kai Hon;Tsung-Han Ku;Rahul Shah;Sharma V. Thankachan;Jeffrey Scott Vitter
Affiliations:
National Tsing Hua University, Taiwan;National Tsing Hua University, Taiwan;Louisiana State University, Louisiana;Louisiana State University, Louisiana;The University of Kansas, Kansas
Venue:
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Year:
2010

Citing 17
Cited 7

A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient Randomized Dictionary Matching Algorithms (Extended Abstract)

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Indexing compressed text

Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Structuring labeled trees for optimal succinctness, and beyond

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Compressed indexes for dynamic text collections

ACM Transactions on Algorithms (TALG)
Compressed Index for Dictionary Matching

DCC '08 Proceedings of the Data Compression Conference
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Compressing and indexing labeled trees, with applications

Journal of the ACM (JACM)
Succinct Text Indexing with Wildcards

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Succinct dictionary matching with no slowdown

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
A framework for dynamizing succinct data structures

ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming

Succincter text indexing with wildcards

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Succinct 2D dictionary matching with no slowdown

WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Compressed text indexing with wildcards

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Succinct indexes for circular patterns

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Efficient algorithm for circular burrows-wheeler transform

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Compressed text indexing with wildcards

Journal of Discrete Algorithms
Compressed indexes for text with wildcards

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size σ, Hon et al. (2008) gave an nHk(D) + o(n log σ)-bit index which supports a query in O(|T|(logε n + log d)+occ) time, where ε 0 and Hk(D) denotes the kth order entropy of D. Very recently, Belazzougui (2010) proposed an elegant scheme, which takes n log σ + O(n) bits of index space and supports a query in optimal O(|T|+occ) time. In this paper, we provide connections between Belazzougui's index and the XBW compression of Ferragina et al. (2005), and show that Belazzougui's index can be slightly modified to be stored in nHk(D)+O(n) bits, while query time remains optimal; this improves the compressed index by Hon et al. (2008) in both space and time.