Succinct Text Indexing with Wildcards

Authors:
Alan Tam;Edward Wu;Tak-Wah Lam;Siu-Ming Yiu
Affiliations:
Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, University of Hong Kong, Hong Kong
Venue:
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Year:
2009

Citing 12
Cited 15

Filtering search: a new approach to query answering

SIAM Journal on Computing
An efficient representation for sparse sets

ACM Letters on Programming Languages and Systems (LOPLAS)
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
STRING-MATCHING AND OTHER PRODUCTS

STRING-MATCHING AND OTHER PRODUCTS
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed indexes for dynamic text collections

ACM Transactions on Algorithms (TALG)
Compressed Index for Dictionary Matching

DCC '08 Proceedings of the Data Compression Conference
Orthogonal range searching in linear and almost-linear space

Computational Geometry: Theory and Applications
Space efficient indexes for string matching with don't cares

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation

Succinct dictionary matching with no slowdown

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Compression, indexing, and retrieval for massive string data

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Faster compressed dictionary matching

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Worst case efficient single and multiple string matching in the RAM model

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Succincter text indexing with wildcards

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
A succinct index for hypertext

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Compressed text indexing with wildcards

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
An index structure for spaced seed search

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model

Journal of Discrete Algorithms
String indexing for patterns with wildcards

SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Efficient SNP-sensitive alignment and database-assisted SNP calling for low coverage samples

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Indexing hypertext

Journal of Discrete Algorithms
Faster compressed dictionary matching

Theoretical Computer Science
Compressed text indexing with wildcards

Journal of Discrete Algorithms
Compressed indexes for text with wildcards

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A succinct text index uses space proportional to the text itself, say, two times n log*** for a text of n characters over an alphabet of size *** . In the past few years, there were several exciting results leading to succinct indexes that support efficient pattern matching. In this paper we present the first succinct index for a text that contains wildcards. The space complexity of our index is (3 + o (1))n log*** + O (***logn ) bits, where *** is the number of wildcard groups in the text. Such an index finds applications in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP), which could be modeled as wildcards. In the course of deriving the above result, we also obtain an alternate succinct index of a set of d patterns for the purpose of dictionary matching. When compared with the succinct index in the literature, the new index doubles the size (precisely, from n log*** to 2 n log*** , where n is the total length of all patterns), yet it reduces the matching time to O (m log*** + m logd + occ ), where m is the length of the query text. It is worth-mentioning that the time complexity no longer depends on the total dictionary size.