Compressed text indexing with wildcards

Authors:
Wing-Kai Hon;Tsung-Han Ku;Rahul Shah;Sharma V. Thankachan;Jeffrey Scott Vitter
Affiliations:
National Tsing Hua University, Taiwan;National Tsing Hua University, Taiwan;Louisiana State University;Louisiana State University;The University of Kansas
Venue:
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Year:
2011

Citing 19
Cited 4

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Sparse Suffix Trees

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Indexing compressed text

Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Note: A simple storage scheme for strings achieving entropy bounds

Theoretical Computer Science
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

ACM Transactions on Algorithms (TALG)
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing

DCC '08 Proceedings of the Data Compression Conference
Compressed Index for Dictionary Matching

DCC '08 Proceedings of the Data Compression Conference
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Orthogonal range searching in linear and almost-linear space

Computational Geometry: Theory and Applications
Succinct Text Indexing with Wildcards

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
On Entropy-Compressed Text Indexing in External Memory

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Space efficient indexes for string matching with don't cares

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Succinct dictionary matching with no slowdown

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Faster compressed dictionary matching

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Succincter text indexing with wildcards

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching

On position restricted substring searching in succinct space

Journal of Discrete Algorithms
Indexing hypertext

Journal of Discrete Algorithms
Compressed indexes for text with wildcards

Theoretical Computer Science
A new compression scheme for secure transmission

International Journal of Automation and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let T = T1φk1T2φk2 .... φkdTd+1 be a text of total length n, where characters of each Ti are chosen from an alphabet Σ of size σ, and φ denotes a wildcard symbol. The text indexing with wildcards problem is to index T such that when we are given a query pattern P, we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nHh + o(n log σ) + O(d log n) bits space, where Hh is the hth-order empirical entropy (h = o(logσ n)) of T.