Efficient Fragmentation of Large XML Documents

Authors:
Angela Bonifati;Alfredo Cuzzocrea
Affiliations:
ICAR Inst., National Research Council, Italy;DEIS Dept., University of Calabria, Italy
Venue:
DEXA '07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Year:
2007

Citing 0
Cited 6

Data mining-based fragmentation of XML data warehouses

Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
Fragmenting very large XML data warehouses via K-means clustering algorithm

International Journal of Business Intelligence and Data Mining
Vertical fragmentation of XML data warehouses using frequent path sets

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

Journal of Parallel and Distributed Computing
MicroClAn: Microarray clustering analysis

Journal of Parallel and Distributed Computing
DynamicNet: an effective and efficient algorithm for supporting community evolution detection in time-evolving information networks

Proceedings of the 17th International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fragmentation techniques for XML data are gaining momentum within both distributed and centralized XML query engines and pose novel and unrecognized challenges to the community. Albeit not novel, and clearly inspired by the classical divide et impera principle, fragmentation for XML trees has been proved successful in boosting the querying performance, and in cutting down the memory requirements. However, fragmentation considered so far has been driven by semantics, i.e. built around query predicates. In this paper, we propose a novel fragmentation technique that founds on structural constraints of XML documents (size, tree-width, and tree-depth) and on special-purpose structure histograms able to meaningfully summarize XML documents. This allows us to predict bounding intervals of structural properties of output (XML) fragments for efficient query processing of distributed XML data. An experimental evaluation of our study confirms the effectiveness of our fragmentation methodology on some representative XML data sets.