Efficient memory representation of XML document trees

Authors:
Giorgio Busatto;Markus Lohrey;Sebastian Maneth
Affiliations:
Department für Informatik, Universität Oldenburg, Germany;Institut für Informatik, Universität Leipzig, Johannisgasse 26, 04103 Leipzig, Germany;National ICT Australia Ltd., Australia1 and University of New South Wales, Sydney, Australia
Venue:
Information Systems
Year:
2008

Citing 23
Cited 15

An algorithm for optimal lambda calculus reduction

POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms on Compressed Strings and Arrays

SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Testing Equivalence of Morphisms on Context-Free Languages

ESA '94 Proceedings of the Second Annual European Symposium on Algorithms
Query Evaluation on Compressed Trees (Extended Abstract)

LICS '03 Proceedings of the 18th Annual IEEE Symposium on Logic in Computer Science
Efficient Lossless Compression of Trees and Graphs

DCC '96 Proceedings of the Conference on Data Compression
Typechecking for XML transformers

Journal of Computer and System Sciences - Special issue on PODS 2000
Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
XPRESS: a queriable compression for XML data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XGRIND: A Query-Friendly XML Compressor

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
XBench Benchmark and Performance Testing of XML DBMSs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Succinct ordinal trees with level-ancestor queries

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Imperfect XML: Rants, Raves, Tips, and Tricks ... from an Insider

Imperfect XML: Rants, Raves, Tips, and Tricks ... from an Insider
Vectorizing and Querying Large XML Repositories

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Structuring labeled trees for optimal succinctness, and beyond

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
The complexity of tree automata and XPath on grammar-compressed trees

Theoretical Computer Science - Implementation and application of automata
A simple optimal representation for balanced parentheses

Theoretical Computer Science
Path queries on compressed XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XQueC: pushing queries to compressed XML data

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Implementing XQuery 1.0: the Galax experience

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XML goes native: run-time representations for XTATIC

CC'05 Proceedings of the 14th international conference on Compiler Construction
The smallest grammar problem

IEEE Transactions on Information Theory

Parameter Reduction in Grammar-Compressed Trees

FOSSACS '09 Proceedings of the 12th International Conference on Foundations of Software Science and Computational Structures: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Compressing and indexing labeled trees, with applications

Journal of the ACM (JACM)
Compression of Probabilistic XML Documents

SUM '09 Proceedings of the 3rd International Conference on Scalable Uncertainty Management
Context unification with one context variable

Journal of Symbolic Computation
A bisection algorithm for grammar-based compression of ordered trees

Information Processing Letters
Unification and matching on compressed terms

ACM Transactions on Computational Logic (TOCL)
Congruence closure of compressed terms in polynomial time

FroCoS'11 Proceedings of the 8th international conference on Frontiers of combining systems
Functional programs as compressed data

PEPM '12 Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation
Random access to grammar-compressed strings

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Fast equality test for straight-line compressed strings

Information Processing Letters
Parameter reduction and automata evaluation for grammar-compressed trees

Journal of Computer and System Sciences
XML compression via DAGs

Proceedings of the 16th International Conference on Database Theory
Tree compression with top trees

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
XML tree structure compression using RePair

Information Systems
A quadsection algorithm for grammar-based image compression

Integrated Computer-Aided Engineering - Anniversary Volume: Celebrating 20 Years of Excellence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. In this paper, a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by compressing their tree structure; the latter means to detect and remove repetitions of tree patterns. Formally, context-free tree grammars that generate only a single tree are used for tree compression. The functionality of basic tree operations, like traversal along edges, is preserved under this compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. The complexity of certain computational problems like validation against XML types or testing equality is investigated for compressed input trees.