Scaling XML query processing: distribution, localization and pruning

Authors:
Patrick Kling;M. Tamer Özsu;Khuzaima Daudjee
Affiliations:
Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada N2L 3G1;Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada N2L 3G1;Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada N2L 3G1
Venue:
Distributed and Parallel Databases
Year:
2011

Citing 37
Cited 0

Chemical markup language

World Wide Web Journal - Special issue on XML: principles, tools, and techniques
Distributed query evaluation on semistructured data

ACM Transactions on Database Systems (TODS)
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
An XML query engine for network-bound data

The VLDB Journal — The International Journal on Very Large Data Bases
Dynamic XML documents with distribution and replication

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Containment and equivalence for a fragment of XPath

Journal of the ACM (JACM)
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Lazy query evaluation for Active XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Full-Fledged Algebraic XPath Processing in Natix

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Cost-sensitive reordering of navigational primitives

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Statistical learning techniques for costing XML queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
On the Intersection of XPath Expressions

IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
Processing XPath Queries in PC-Clusters Using XML Data Partitioning

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Content-based Dissemination of Fragmented XML Data

ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Using partial evaluation in distributed query evaluation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Distributed query evaluation with performance guarantees

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Highly distributed XQuery with DXQ

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Active XML: peer-to-peer data and web services integration

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
MARS: a system for publishing XML from mixed and redundant storage

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Projecting XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Answering xpath queries over networks by sending minimal views

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
XRPC: interoperable and efficient distributed XQuery

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XRPC: distributed XQuery and update processing with heterogeneous XQuery engines

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The Active XML project: an overview

The VLDB Journal — The International Journal on Very Large Data Bases
A bottom-up algorithm for query decomposition

International Journal of Innovative Computing and Applications
Depth-first search and linear grajh algorithms

SWAT '71 Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971)
Distributed Structural Relaxation of XPath Queries

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient Distribution of Full-Fledged XQuery

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Distributed XML design

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Generating efficient execution plans for vertically partitioned XML databases

Proceedings of the VLDB Endowment
Principles of Distributed Database Systems

Principles of Distributed Database Systems
XPathMark: an XPath benchmark for the XMark generated data

XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Efficiently processing XML queries over fragmented repositories with partix

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of the XML data and query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically. Based on this, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning. Localization takes a fragmentation-unaware query plan and converts it to a distributed query plan that can be executed at the sites that hold XML data fragments in a distributed system. We then show how the resulting distributed query plan can be pruned so that only those sites are accessed that can contribute to the query result. We demonstrate that our techniques can be integrated into a real-life XML database system and that they significantly improve the performance of distributed query execution.