A space efficient XML DOM parser

  • Authors:
  • Fangju Wang;Jing Li;Hooman Homayounfar

  • Affiliations:
  • Department of Computing and Information Science, University of Guelph, Guelph, Ont., Canada N1G 2W1;Department of Computing and Information Science, University of Guelph, Guelph, Ont., Canada N1G 2W1;Department of Computing and Information Science, University of Guelph, Guelph, Ont., Canada N1G 2W1

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many XML applications, parsing is a key operation. When the processing involves modifying data, random access, and/or in an order different from the one in which elements are stored, a DOM parser has to be used. A major problem with using a DOM parser is memory consumption. The size of a DOM tree created from an XML document may be as large as 10 times of the size of the original document. Maintaining the tree of a big document requires a large amount of memory. It may cause costly swapping. In the worst cases, a DOM parser cannot handle a document at all because of its size. In this research, we develop a space efficient DOM parser, called SEDOM. It is based on a new compression approach and a set of manipulation algorithms, which enable many DOM operations to be performed when the data are in the compressed format, and allow individual parts of a document to be compressed, decompressed and manipulated. It can be used to efficiently manipulate very large XML documents. In this paper, we describe SEDOM, and compare its performance with three existing DOM parsers and an XML compressor.