Succinct Text Indexing with Wildcards

  • Authors:
  • Alan Tam;Edward Wu;Tak-Wah Lam;Siu-Ming Yiu

  • Affiliations:
  • Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, University of Hong Kong, Hong Kong;Department of Computer Science, University of Hong Kong, Hong Kong

  • Venue:
  • SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A succinct text index uses space proportional to the text itself, say, two times n log*** for a text of n characters over an alphabet of size *** . In the past few years, there were several exciting results leading to succinct indexes that support efficient pattern matching. In this paper we present the first succinct index for a text that contains wildcards. The space complexity of our index is (3 + o (1))n log*** + O (***logn ) bits, where *** is the number of wildcard groups in the text. Such an index finds applications in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP), which could be modeled as wildcards. In the course of deriving the above result, we also obtain an alternate succinct index of a set of d patterns for the purpose of dictionary matching. When compared with the succinct index in the literature, the new index doubles the size (precisely, from n log*** to 2 n log*** , where n is the total length of all patterns), yet it reduces the matching time to O (m log*** + m logd + occ ), where m is the length of the query text. It is worth-mentioning that the time complexity no longer depends on the total dictionary size.