XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms and Data Structures in VLSI Design
Algorithms and Data Structures in VLSI Design
Automata theory for XML researchers
ACM SIGMOD Record
Typechecking for Semistructured Data
DBPL '01 Revised Papers from the 8th International Workshop on Database Programming Languages
Query Evaluation on Compressed Trees (Extended Abstract)
LICS '03 Proceedings of the 18th Annual IEEE Symposium on Logic in Computer Science
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
The complexity of tree automata and XPath on grammar-compressed trees
Theoretical Computer Science - Implementation and application of automata
Journal of Computer and System Sciences
XQueC: A query-conscious compressed XML database
ACM Transactions on Internet Technology (TOIT)
Path queries on compressed XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient memory representation of XML document trees
Information Systems
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
Tree Structure Compression with RePair
DCC '11 Proceedings of the 2011 Data Compression Conference
Random access to grammar-compressed strings
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Parameter reduction and automata evaluation for grammar-compressed trees
Journal of Computer and System Sciences
FDB: a query engine for factorised relational databases
Proceedings of the VLDB Endowment
XML tree structure compression using RePair
Information Systems
Hi-index | 0.00 |
Unranked trees can be represented using their minimal dag (directed acyclic graph). For XML this achieves high compression ratios due to their repetitive mark up. Unranked trees are often represented through first child/next sibling (fcns) encoded binary trees. We study the difference in size (= number of edges) of minimal dag versus minimal dag of the fcns encoded binary tree. One main finding is that the size of the dag of the binary tree can never be smaller than the square root of the size of the minimal dag, and that there are examples that match this bound. We introduce a new combined structure, the hybrid dag, which is guaranteed to be smaller than (or equal in size to) both dags. Interestingly, we find through experiments that last child/previous sibling encodings are much better for XML compression via dags, than fcns encodings. This is because optional elements are more likely to appear towards the end of child sequences.