Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Query size estimation by adaptive sampling (extended abstract)
PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Selectively estimation for Boolean queries
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On supporting containment queries in relational database management systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity Estimation in the Presence of Alphanumeric Correlations
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
One-dimensional and multi-dimensional substring selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
The XML benchmark project
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Multi-level operator combination in XML query processing
Proceedings of the eleventh international conference on Information and knowledge management
EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
The VLDB Journal — The International Journal on Very Large Data Bases
Containment join size estimation: models and methods
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TIMBER: a native system for querying XML
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Accelerating XPath evaluation in any RDBMS
ACM Transactions on Database Systems (TODS)
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Semantic Similarity Search on Semistructured Data with the XXL Search Engine
Information Retrieval
CXHist: an on-line classification-based histogram for XML string selectivity estimation
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Accelerating queries by pruning XML documents
Data & Knowledge Engineering
Applying cosine series to join size estimation
Proceedings of the 14th ACM international conference on Information and knowledge management
Cost-based optimization in DB2 XML
IBM Systems Journal
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
Structure and value synopses for XML data graphs
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficiently Querying Large XML Data Repositories: A Survey
IEEE Transactions on Knowledge and Data Engineering
MARS: a system for publishing XML from mixed and redundant storage
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Mixed mode XML query processing
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Holistic twig joins on indexed XML documents
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A relational model for XML structural joins and their size estimations
Knowledge and Information Systems
A cost-based join selection for XML twig content-based queries
DataX '08 Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web
Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
ROX: run-time optimization of XQueries
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Synopsis based load shedding in XML streams
Proceedings of the 2009 EDBT/ICDT Workshops
Statistics-based parallelization of XPath queries in shared memory systems
Proceedings of the 13th International Conference on Extending Database Technology
Query and update through XML views
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
Towards a comprehensive assessment for selectivity estimation approaches of XML queries
International Journal of Web Engineering and Technology
Updating XML views and querying XML views with update syntax
International Journal of Computational Science and Engineering
Caching frequent XML query patterns
APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A histogram-based selectivity estimator for skewed XML data
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
A statistical approach for XML query size estimation
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Applying cosine series to XML structural join size estimation
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Top-K data source selection for keyword queries over multiple XML data sources
Journal of Information Science
Histograms as statistical estimators for aggregate queries
Information Systems
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in "next-step" decisions, such as query refinement. This paper proposes a technique to obtain query result size estimates effectively in an XML database.Queries in XML frequently specify structural patterns, requiring specific relationships between selected elements. Whereas traditional techniques can estimate the number of nodes (XML elements) that will satisfy a node-specific predicate in the query pattern, such estimates cannot easily be combined to provide estimates for the entire query pattern, since element occurrences are expected to have high correlation.We propose a solution based on a novel histogram encoding of element occurrence position. With such position histograms, we are able to obtain estimates of sizes for complex pattern queries, as well as for simpler intermediate patterns that may be evaluated in alternative query plans, by means of a position histogram join (pH-join) algorithm that we introduce. We extend our technique to exploit schema information regarding allowable structure (the no-overlap property) through the use of a coverage histogram.We present an extensive experimental evaluation using several XML data sets, both real and synthetic, with a variety of queries. Our results demonstrate that accurate and robust estimates can be achieved, with limited space, and at a miniscule computational cost. These techniques have been implemented in the context of the TIMBER native XML database [22] at the University of Michigan.