Document decomposition for XML compression: a heuristic approach

Authors:
Byron Choi
Affiliations:
Nanyang Technological University
Venue:
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Year:
2006

Citing 15
Cited 3

Join indices

ACM Transactions on Database Systems (TODS)
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
SPARTAN: a model-based semantic compression system for massive data tables

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Semantic Compression and Pattern Extraction with Fascicles

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
XPRESS: a queriable compression for XML data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Compressing XML with Multiplexed Hierarchical PPM Models

DCC '01 Proceedings of the Data Compression Conference
XGRIND: A Query-Friendly XML Compressor

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
ItCompress: An Iterative Semantic Compression Algorithm

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Vectorizing and Querying Large XML Repositories

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Path queries on compressed XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

EXEM: Efficient XML data exchange management for mobile applications

Information Systems Frontiers
Service-oriented architecture for mobile applications

Proceedings of the 1st international workshop on Software architectures and mobility
A quantitative summary of XML structures

ER'06 Proceedings of the 25th international conference on Conceptual Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sharing of common subtrees has been reported useful not only for XML compression but also for main-memory XML query processing. This method compresses subtrees only when they exhibit identical structure. Even slight irregularities among subtrees dramatically reduce the performance of compression algorithms of this kind. Furthermore, when XML documents are large, the chance of having large number of identical subtrees is inherently low. In this paper, we proposed a method of decomposing XML documents for better compression. We proposed a heuristic method of locating minor irregularities in XML documents. The irregularities are then projected out from the original XML document. We refered this process to as document decomposition. We demonstrated that better compression can be achieved by compressing the decomposed documents separately. Experimental results demonstrated that the compressed skeletons, for all real-world datasets, to our knowledge, fit comfortably into main memory of commodity computers nowadays. Preliminary results on querying compressed skeletons validate the effectiveness our approach.