Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Query size estimation by adaptive sampling
Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Selectively estimation for Boolean queries
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On supporting containment queries in relational database management systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity Estimation in the Presence of Alphanumeric Correlations
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
One-dimensional and multi-dimensional substring selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
The XML benchmark project
The VLDB Journal — The International Journal on Very Large Data Bases
Benefits of path summaries in an XML query optimizer supporting multiple access methods
VLDB '05 Proceedings of the 31st international conference on Very large data bases
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Structural summaries for efficient XML query processing
Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
Enhancing histograms by tree-like bucket indices
The VLDB Journal — The International Journal on Very Large Data Bases
Advanced structural joins using element distribution
Information Sciences: an International Journal
Efficient fragmentation of large XML documents
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Histograms as statistical estimators for aggregate queries
Information Systems
Hi-index | 0.01 |
Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in "next-step" decisions, such as query refinement. This paper proposes a technique to obtain query result size estimates effectively in an XML database.Queries in XML frequently specify structural patterns, requiring specific relationships between selected elements. Whereas traditional techniques can estimate the number of nodes (XML elements) that will satisfy a node-specific predicate in the query pattern, such estimates cannot easily be combined to provide estimates for the entire query pattern, since element occurrences are expected to have high correlation.We propose a solution based on a novel histogram encoding of element occurrence position. With such position histograms, we are able to obtain estimates of sizes for complex pattern queries, as well as for simpler intermediate patterns that may be evaluated in alternative query plans, by means of a position histogram join (pH-join) algorithm that we introduce. We extend our technique to exploit schema information regarding allowable structure (the no-overlap property) through the use of a coverage histogram.We present an extensive experimental evaluation using several XML data sets, both real and synthetic, with a variety of queries. Our results demonstrate that accurate and robust estimates can be achieved, with limited space, and at a minuscule computational cost. These techniques have been implemented in the context of the TIMBER native XML database (available at http://www.eecs.umich.edu/db/timber) at the University of Michigan.