Fully compressed suffix trees

  • Authors:
  • Luís M. S. Russo;Gonzalo Navarro;Arlindo L. Oliveira

  • Affiliations:
  • INESC-ID/Instituto Superior Técnico, Technical University of Lisbon, Portugal;University of Chile, Santiago, Chile;INESC-ID/Instituto Superior Técnico, Technical University of Lisbon, Portugal

  • Venue:
  • ACM Transactions on Algorithms (TALG)
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Suffix trees are by far the most important data structure in stringology, with a myriad of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require Θ(n log n) bits of space, for a string of size n. This is considerably more than the n log2 σ bits needed for the string itself, where σ is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Θ(n) extra bits. This is already spectacular, but the linear extra bits are still unsatisfactory when σ is small as in DNA sequences. In this article, we introduce the first compressed suffix tree representation that breaks this Θ(n)-bit space barrier. The Fully Compressed Suffix Tree (FCST) representation requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time. This includes extracting arbitrary text substrings, so the FCST replaces the text using almost the same space as the compressed text. An essential ingredient of FCSTs is the lowest common ancestor (LCA) operation. We reveal important connections between LCAs and suffix tree navigation. We also describe how to make FCSTs dynamic, that is, support updates to the text. The dynamic FCST also supports several operations. In particular, it can build the static FCST within optimal space and polylogarithmic time per symbol. Our theoretical results are also validated experimentally, showing that FCSTs are very effective in practice as well.