R* optimizer validation and performance evaluation for local queries
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
RHist: adaptive summarization over continuous data streams
Proceedings of the eleventh international conference on Information and knowledge management
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structure and value synopses for XML data graphs
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
SASH: a self-adaptive histogram set for dynamically changing workloads
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
IMAX: Incremental Maintenance of Schema-Based XML Statistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Statistical learning techniques for costing XML queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
CXHist: an on-line classification-based histogram for XML string selectivity estimation
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ACM SIGMOD Record
Cost-based optimization in DB2 XML
IBM Systems Journal
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
An efficient infrastructure for native transactional XML processing
Data & Knowledge Engineering
Accurate histogram-based XML summarization
Proceedings of the 2008 ACM symposium on Applied computing
Dynamic adaptive data structures for monitoring data streams
Data & Knowledge Engineering
XSelMark: A Micro-benchmark for Selectivity Estimation Approaches of XML Queries
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
EXsum: an XML summarization framework
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
A trust management framework for service-oriented environments
Proceedings of the 18th international conference on World wide web
Materialized View Selection in XML Databases
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
In the Search of NECTARs from Evolutionary Trees
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
ROX: run-time optimization of XQueries
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Using Structural Joins and Holistic Twig Joins for Native XML Query Optimization
ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Synopsis based load shedding in XML streams
Proceedings of the 2009 EDBT/ICDT Workshops
Statistics-based parallelization of XPath queries in shared memory systems
Proceedings of the 13th International Conference on Extending Database Technology
BNCOD'07 Proceedings of the 24th British national conference on Databases
LCA-based selection for XML document collections
Proceedings of the 19th international conference on World wide web
Effective pruning for XML structural match queries
Data & Knowledge Engineering
Towards a comprehensive assessment for selectivity estimation approaches of XML queries
International Journal of Web Engineering and Technology
Selectivity-based XML query processing in structured peer-to-peer networks
Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Cardinality estimation and dynamic length adaptation for Bloom filters
Distributed and Parallel Databases
Estimating selectivity for joined RDF triple patterns
Proceedings of the 20th ACM international conference on Information and knowledge management
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Towards benefit-based RDF source selection for SPARQL queries
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Hi-index | 0.00 |
Cost-based XML query optimization calls for accurate estimation of the selectivity of path expressions. Some other interactive and internet applications can also benefit from such estimations. While there are a number of estimation techniques proposed in the literature, almost none of them has any guarantee on the estimation accuracy within a given space limit. In addition, most of them assume that the XML data are more or less static, i.e., with few updates. In this paper, we present a framework for XML path selectivity estimation in a dynamic context. Specifically, we propose a novel data structure, bloom histogram, to approximate XML path frequency distribution within a small space budget and to estimate the path selectivity accurately with the bloom histogram. We obtain the upper bound of its estimation error and discuss the trade-offs between the accuracy and the space limit. To support updates of bloom histograms efficiently when underlying XML data change, a dynamic summary layer is used to keep exact or more detailed XML path information. We demonstrate through our extensive experiments that the new solution can achieve significantly higher accuracy with an even smaller space than the previous methods in both static and dynamic environments.