A sampling approach for XML query selectivity estimation

Authors:
Cheng Luo;Zhewei Jiang;Wen-Chi Hou;Feng Yu;Qiang Zhu
Affiliations:
Coppin State University, Baltimore, MD;Southern Illinois University, Carbondale, IL;Southern Illinois University, Carbondale, IL;Southern Illinois University, Carbondale, IL;University of Michigan-Dearborn, Dearborn, MI
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 23
Cited 2

Processing aggregate relational queries with hard time constraints

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fixed-precision estimation of join selectivity

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bifocal sampling for skew-resistant join size estimation

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions

Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
StatiX: making XML count

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The XML benchmark project

The XML benchmark project
Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Selectivity Estimation for XML Twigs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Efficient processing of XML twig queries with OR-predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximate XML query answers

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On boosting holism in XML twig pattern matching using structural indexing techniques

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
From region encoding to extended dewey: on efficient processing of XML twig pattern matching

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Tree-pattern queries on a lightweight XML processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
XSEED: Accurate and Fast Cardinality Estimation for XPath Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Estimating XML Structural Join Size Quickly and Economically

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Introduction to Probability Models, Ninth Edition

Introduction to Probability Models, Ninth Edition
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

XMin: Minimizing Tree Pattern Queries with Minimality Guarantee

World Wide Web
A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the Extensible Markup Language (XML) rapidly establishes itself as the de facto standard for presenting, storing, and exchanging data on the Internet, large volume of XML data and their supporting facilities start to surface. A fast and accurate selectivity estimation mechanism is of practical importance because selectivity estimation plays a fundamental role in XML query optimization. Recently proposed techniques are all based on some forms of structure synopses that could be time-consuming to build and not effective for summarizing complex structure relationships. In this research, we propose an innovative sampling method that can capture the tree structures and intricate relationships among nodes in a simple and effective way. The derived sample tree is stored as a synopsis for selectivity estimation. Extensive experimental results show that, in comparison with the state-of-the-art structure synopses, specifically the TreeSketch and Xseed synopses, our sample tree synopsis applies to a broader range of query types, requires several orders of magnitude less construction time, and generates estimates with considerably better precision for complex datasets.