ESP-index: a compressed index based on edit-sensitive parsing

  • Authors:
  • Shirou Maruyama;Masaya Nakahara;Naoya Kishiue;Hiroshi Sakamoto

  • Affiliations:
  • Kyushu University, Fukuoka-shi, Fukuoka;Kyushu Institute of Technology, Iizuka-shi, Fukuoka;Kyushu Institute of Technology, Iizuka-shi, Fukuoka;Kyushu Institute of Technology, Iizuka-shi, Fukuoka and PRESTO JST, Saitama, Japan

  • Venue:
  • SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a compressed self-index based the edit-sensitive parsing (ESP). Given a string S, its ESP tree is equivalent to a contextfree grammar deriving just S, which can be represented as a DAG G. Finding pattern P in S is reduced to embedding P into G. Succinct data structures are adopted and G is then decomposed into two LOUDS bit strings and a single array for permutation, requiring (1 + ε)n log n + 4n + o(n) bits for any 0 n corresponds to the number of different symbols in the grammar. The time to count the occurrences of P in S is in O(log*u/ε (mlog n+occc(logmlog u))), where m = |P|, u = |S|, and occc is the number of occurrences of a maximal common subtree in ESP trees of P and S. Using an additional array in n log u bits of space, our index supports locating P and displaying substring of S. Locating time is the same as counting time and displaying time for a substring of length m is O(m + log u).