Compressed text indexing with wildcards

Authors:
Wing-Kai Hon;Tsung-Han Ku;Rahul Shah;Sharma V. Thankachan;Jeffrey Scott Vitter
Affiliations:
National Tsing Hua University, Taiwan;National Tsing Hua University, Taiwan;Louisiana State University, USA;Louisiana State University, USA;The University of Kansas, USA
Venue:
Journal of Discrete Algorithms
Year:
2013

Citing 21
Cited 0

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Text indexing and dictionary matching with one error

Journal of Algorithms
Deterministic sorting in O(nlog log n) time and linear space

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Sparse Suffix Trees

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Indexing compressed text

Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
Note: A simple storage scheme for strings achieving entropy bounds

Theoretical Computer Science
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

ACM Transactions on Algorithms (TALG)
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing

DCC '08 Proceedings of the Data Compression Conference
Compressed Index for Dictionary Matching

DCC '08 Proceedings of the Data Compression Conference
Orthogonal range searching in linear and almost-linear space

Computational Geometry: Theory and Applications
Succinct Text Indexing with Wildcards

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
On Entropy-Compressed Text Indexing in External Memory

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Space efficient indexes for string matching with don't cares

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Succinct dictionary matching with no slowdown

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Faster compressed dictionary matching

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Orthogonal range searching on the RAM, revisited

Proceedings of the twenty-seventh annual symposium on Computational geometry
Succincter text indexing with wildcards

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let T=T"1@f^k^"^1T"2@f^k^"^2...@f^k^"^dT"d"+"1 be a text of total length n, where characters of each T"i are chosen from an alphabet @S of size @s, and @f denotes a wildcard symbol. The text indexing with wildcards problem is to index T such that when we are given a query pattern P, we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nH"h+o(nlog@s)+O(dlogn) bits of space, where H"h is the hth-order empirical entropy (h=o(log"@sn)) of T.