A space efficient XML DOM parser

Authors:
Fangju Wang;Jing Li;Hooman Homayounfar
Affiliations:
Department of Computing and Information Science, University of Guelph, Guelph, Ont., Canada N1G 2W1;Department of Computing and Information Science, University of Guelph, Guelph, Ont., Canada N1G 2W1;Department of Computing and Information Science, University of Guelph, Guelph, Ont., Canada N1G 2W1
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 4
Cited 2

XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Millau: an encoding format for efficient representation and exchange of XML over the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Compressing XML with Multiplexed Hierarchical PPM Models

DCC '01 Proceedings of the Data Compression Conference
XGRIND: A Query-Friendly XML Compressor

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Indexing and querying XML using extended Dewey labeling scheme

Data & Knowledge Engineering
Link-based hidden attribute discovery for objects on Web

Proceedings of the 14th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many XML applications, parsing is a key operation. When the processing involves modifying data, random access, and/or in an order different from the one in which elements are stored, a DOM parser has to be used. A major problem with using a DOM parser is memory consumption. The size of a DOM tree created from an XML document may be as large as 10 times of the size of the original document. Maintaining the tree of a big document requires a large amount of memory. It may cause costly swapping. In the worst cases, a DOM parser cannot handle a document at all because of its size. In this research, we develop a space efficient DOM parser, called SEDOM. It is based on a new compression approach and a set of manipulation algorithms, which enable many DOM operations to be performed when the data are in the compressed format, and allow individual parts of a document to be compressed, decompressed and manipulated. It can be used to efficiently manipulate very large XML documents. In this paper, we describe SEDOM, and compare its performance with three existing DOM parsers and an XML compressor.