Structuring labeled trees for optimal succinctness, and beyond

Authors:
Paolo Ferragina;Fabrizio Luccio;Giovanni Manzini;S. Muthukrishnan
Affiliations:
University of Pisa;University of Pisa;University of Piemonte Orientale;Rutgers University
Venue:
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Year:
2005

Citing 24
Cited 39

Syntax-directed compression of program files

Software—Practice & Experience
The design and implementation of a certifying compiler

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Trie memory

Communications of the ACM
Space efficient suffix trees

Journal of Algorithms
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Representing Trees of Higer Degree

WADS '99 Proceedings of the 6th International Workshop on Algorithms and Data Structures
Statistical Models for Term Compression

DCC '00 Proceedings of the Conference on Data Compression
Succinct representation of balanced parentheses, static trees and planar graphs

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Context coding of parse trees

DCC '95 Proceedings of the Conference on Data Compression
Compressing XML with Multiplexed Hierarchical PPM Models

DCC '01 Proceedings of the Data Compression Conference
Succinct ordinal trees with level-ancestor queries

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Boosting textual compression in optimal linear time

Journal of the ACM (JACM)
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Efficient tree pattern matching

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Smoothing and compression with stochastic k-testable tree languages

Pattern Recognition
Simple linear work suffix array construction

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming

Rank/select operations on large alphabets: a tool for text indexing

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Squeezing succinct data structures into entropy bounds

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Compressing and searching XML data via two zips

Proceedings of the 15th international conference on World Wide Web
Succinct ordinal trees with level-ancestor queries

ACM Transactions on Algorithms (TALG)
User modeling for personalized Web search with self-organizing map: Research Articles

Journal of the American Society for Information Science and Technology
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Ultra-succinct representation of ordered trees

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct indexes for strings, binary relations and multi-labeled trees

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
From first principles to the Burrows and Wheeler transform and beyond, via combinatorial optimization

Theoretical Computer Science
Adaptive searching in succinctly encoded binary relations and tree-structured documents

Theoretical Computer Science
An extension of the Burrows–Wheeler Transform

Theoretical Computer Science
Efficient memory representation of XML document trees

Information Systems
On searching compressed string collections cache-obliviously

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On Compact Representations of All-Pairs-Shortest-Path-Distance Matrices

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
An Improved Succinct Representation for Dynamic k-ary Trees

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
On the Redundancy of Succinct Data Structures

SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
Succinct geometric indexes supporting point location queries

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
XML compression techniques: A survey and comparison

Journal of Computer and System Sciences
Universal Succinct Representations of Trees?

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing

WADS '09 Proceedings of the 11th International Symposium on Algorithms and Data Structures
Compressing and indexing labeled trees, with applications

Journal of the ACM (JACM)
Compression of concatenated web pages using XBW

SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
Note: On compact representations of All-Pairs-Shortest-Path-Distance matrices

Theoretical Computer Science
Fully-functional succinct trees

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A web search engine model based on index-query bit-level compression

Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications
Compression, indexing, and retrieval for massive string data

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Faster compressed dictionary matching

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Succinct indexes for strings, binary relations and multilabeled trees

ACM Transactions on Algorithms (TALG)
Compressed string dictionaries

SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Succinct 2D dictionary matching with no slowdown

WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays

SIAM Journal on Computing
Adaptive searching in succinctly encoded binary relations and tree-structured documents

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Statistical encoding of succinct data structures

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Reducing the space requirement of LZ-Index

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Succinct geometric indexes supporting point location queries

ACM Transactions on Algorithms (TALG)
Dynamic rank-select structures with applications to run-length encoded texts

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
A framework for dynamizing succinct data structures

ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Development of a Novel Compressed Index-Query Web Search Engine Model

International Journal of Information Technology and Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consider an ordered, static tree \tau on t nodes where each node has a label from alphabet set \sum. Tree \tau may be of arbitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u, the ith child of u, and any child of u with label a.In a seminal work over fifteen years ago, Jacobson [15] observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the information-theoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings and multisets. Still, for the fundamental problem of structuring labeled trees succinctly, few results, if any, exist even though labeled trees arise frequently in practice, e.g. in the data as in markup text (XML) or in augmented data structures. We present a novel approach to the problem of succinct manipulation of labeled trees by designing what we call the xbw transform of the tree, in the spirit of the well-known Burrows-Wheeler transform for strings. xbw transform uses path-sorting and grouping to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. Using the properties of the xbw transform, we (i) derive the first-known (near-)optimal results for succinct representation of labeled trees with O(1) time for navigation operations, (ii) optimally support the powerful subpath search operation for the first time, and (iii) introduce a notion of tree entropy and present linear time algorithms for compressing a given labeled tree up to its entropy beyond the information-theoretic lower bound averaged over all tree inputs.Our xbw transform is simpleand likely to spur new resultsin the theory of treecompression and indexing, and