Efficient sampling strategies for relational database operations
ICDT Selected papers of the 4th international conference on Database theory
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structure and value synopses for XML data graphs
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
FleXPath: flexible structure and full-text querying for XML
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Benefits of path summaries in an XML query optimizer supporting multiple access methods
VLDB '05 Proceedings of the 31st international conference on Very large data bases
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Hash-base subgraph query processing method for graph-structured XML documents
Proceedings of the VLDB Endowment
A sampling approach for XML query selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query Optimization for Complex Path Queries on XML Data
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Materialized View Selection in XML Databases
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Cost based plan selection for xpath
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Synopsis based load shedding in XML streams
Proceedings of the 2009 EDBT/ICDT Workshops
Efficient physical operators for cost-based XPath execution
Proceedings of the 13th International Conference on Extending Database Technology
Suggestion of promising result types for XML keyword search
Proceedings of the 13th International Conference on Extending Database Technology
LCA-based selection for XML document collections
Proceedings of the 19th international conference on World wide web
Graph summaries for subgraph frequency estimation
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A histogram-based selectivity estimator for skewed XML data
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Top-K data source selection for keyword queries over multiple XML data sources
Journal of Information Science
Adding logical operators to tree pattern queries on graph-structured data
Proceedings of the VLDB Endowment
Efficient processing of XML twig pattern: a novel one-phase holistic solution
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
MESSIAH: missing element-conscious SLCA nodes search in XML data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Spelling Suggestion for XML Keyword Search Based on XSketch Synopsis
Proceedings of International Conference on Information Integration and Web-based Applications & Services
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Twig queries represent the building blocks of declarativequery languages over XML data. A twig query describesa complex traversal of the document graph and generatesa set of element tuples based on the intertwined evaluation(i.e., join) of multiple path expressions. Estimatingthe result cardinality of twig queries or, equivalently, thenumber of tuples in such a structural (path-based) join, isa fundamental problem that arises in the optimization ofdeclarative queries over XML. It is crucial, therefore, to developconcise synopsis structures that summarize the documentgraph and enable such selectivity estimates within thetime and space constraints of the optimizer. In this paper,we propose novel summarization and estimation techniquesfor estimating the selectivity of twig queries with complexXPath expressions over tree-structured data. Our approachis based on the XSKETCH model, augmented with new typesof distribution information for capturing complex correlationpatterns across structural joins. Briefly, the key ideais to represent joins as points in a multidimensional spaceof path counts that capture aggregate information on thecontents of the resulting element tuples. We develop a systematicframework that combines distribution informationwith appropriate statistical assumptions in order to provideselectivity estimates for twig queries over concise XS-KETCHsynopses and we describe an efficient algorithm forconstructing an accurate summary for a given space budget.Implementation results with both synthetic and real-lifedata sets verify the effectiveness of our approach anddemonstrate its benefits over earlier techniques.