Ultra-succinct representation of ordered trees with applications

  • Authors:
  • Jesper Jansson;Kunihiko Sadakane;Wing-Kin Sung

  • Affiliations:
  • Ochanomizu University, 2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan;National Institute of Informatics (NII), Hitotsubashi 2-1-2, Chiyoda-ku, Tokyo 101-8430, Japan;Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543

  • Venue:
  • Journal of Computer and System Sciences
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

There exist two well-known succinct representations of ordered trees: BP (balanced parenthesis) (Munro and Raman, 2001) [20] and DFUDS (depth first unary degree sequence) (Benoit et al., 2005) [1]. Both have size 2n+o(n) bits for n-node trees, which asymptotically matches the information-theoretic lower bound. Importantly, many fundamental operations on trees can be done in constant time on the word RAM when using BP or DFUDS, for example finding the parent, the first child, the next sibling, the number of descendants, etc. Although the space needed to store the BP or DFUDS representation of an ordered tree matches the lower bound, this is not optimal when we consider encodings for certain special classes of trees such as trees in which every internal node has exactly two children. In this paper, we introduce a new, conditional entropy for trees called the tree degree entropy, and give a succinct tree representation with matching size. We call such a representation an ultra-succinct data structure. We show how to modify the DFUDS representation to obtain a ''compressed DFUDS'', and as a consequence, a tree in which every internal node has exactly two children can be represented in n+o(n) bits. We also describe applications of the compressed DFUDS representation to ultra-succinct compressed suffix trees and labeled trees.