Engineering succinct DOM

  • Authors:
  • O'Neil Delpratt;Rajeev Raman;Naila Rahman

  • Affiliations:
  • University of Leicester, Leicester, UK;University of Leicester, Leicester, UK;University of Leicester, Leicester, UK

  • Venue:
  • EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the engineering of Succinct DOM (SDOM), a DOM implementation, written in C++, which is suitable for in-memory representation of large static XML documents. SDOM avoids the use of pointers, and is based upon succinct data structures, which use an information-theoretically minimum amount of space to represent an object. SDOM gives a space-efficient in-memory representation, with stable and predictable memory usage. The space used by SDOM is an order of magnitude less than that used by a standard C++ DOM representation such as Xerces, but SDOM is extremely fast: navigation is in some cases faster than for a pointer-based representation such as Xerces (even for moderate-sized documents which can comfortably be loaded into main memory by Xerces). A variant, SDOM-CT, applies bzip-based compression to textual and attribute data, and its space usage is comparable with "queryable" XML compressors. Some of these compressors support navigation and/or querying (e.g. subpath queries) of the compressed file. SDOM-CT does not support querying directly, but remains extremely fast: it is several orders of magnitude faster for navigation than queryable XML compressors that support navigation (and only a few times slower than say Xerces).