Variations on the Common Subexpression Problem
Journal of the ACM (JACM)
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Automata theory for XML researchers
ACM SIGMOD Record
Testing Equivalence of Morphisms on Context-Free Languages
ESA '94 Proceedings of the Second Annual European Symposium on Algorithms
Query Evaluation on Compressed Trees (Extended Abstract)
LICS '03 Proceedings of the 18th Annual IEEE Symposium on Logic in Computer Science
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Compressing XML with Multiplexed Hierarchical PPM Models
DCC '01 Proceedings of the Data Compression Conference
Taxonomy of XML schema languages using formal language theory
ACM Transactions on Internet Technology (TOIT)
The complexity of tree automata and XPath on grammar-compressed trees
Theoretical Computer Science - Implementation and application of automata
Journal of Computer and System Sciences
Using structural contexts to compress semistructured text collections
Information Processing and Management: an International Journal
XMark: a benchmark for XML data management
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Path queries on compressed XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient memory representation of XML document trees
Information Systems
The Complexity of Monadic Second-Order Unification
SIAM Journal on Computing
XML Tree Structure Compression
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
A bisection algorithm for grammar-based compression of ordered trees
Information Processing Letters
Fully-functional succinct trees
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Unification and matching on compressed terms
ACM Transactions on Computational Logic (TOCL)
Tree Structure Compression with RePair
DCC '11 Proceedings of the 2011 Data Compression Conference
Congruence closure of compressed terms in polynomial time
FroCoS'11 Proceedings of the 8th international conference on Frontiers of combining systems
Functional programs as compressed data
PEPM '12 Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation
Parameter reduction and automata evaluation for grammar-compressed trees
Journal of Computer and System Sciences
IEEE Transactions on Information Theory
Proceedings of the 16th International Conference on Database Theory
Tree compression with top trees
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Hi-index | 0.00 |
XML tree structures can conveniently be represented using ordered unranked trees. Due to the repetitiveness of XML markup these trees can be compressed effectively using dictionary-based methods, such as minimal directed acyclic graphs (DAGs) or straight-line context-free (SLCF) tree grammars. While minimal SLCF tree grammars are in general smaller than minimal DAGs, they cannot be computed in polynomial time unless P=NP. Here, we present a new linear time algorithm for computing small SLCF tree grammars, called TreeRePair, and show that it greatly outperforms the best known previous algorithm BPLEX. TreeRePair is a generalization to trees of Larsson and Moffat's RePair string compression algorithm. SLCF tree grammars can be used as efficient memory representations of trees. Using TreeRePair, we are able to produce the smallest queryable memory representation of ordered trees that we are aware of. Our investigations over a large corpus of commonly used XML documents show that tree traversals over TreeRePair grammars are 14 times slower than over pointer structures and 5 times slower than over succinct trees, while memory consumption is only 1/43 and 1/6, respectively. With respect to file compression we are able to show that a Huffman-based coding of TreeRePair grammars gives compression ratios comparable to the best known XML file compressors.