Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
On the learnability of discrete distributions
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Estimating alphanumeric selectivity in the presence of wildcards
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Substring selectivity estimation
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Building XML statistics for the hidden web
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Selectivity Estimation for String Predicates: Overcoming the Underestimation Problem
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Indexing Useful Structural Patterns for XML Query Processing
IEEE Transactions on Knowledge and Data Engineering
Selectivity estimation for fuzzy string predicates in large data sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient and versatile query engine for TopX search
VLDB '05 Proceedings of the 31st international conference on Very large data bases
CXHist: an on-line classification-based histogram for XML string selectivity estimation
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Cost-based optimization in DB2 XML
IBM Systems Journal
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Bloom histogram: path selectivity estimation for XML data with updates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Extending q-grams to estimate selectivity of string matching with low edit distance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Accurate histogram-based XML summarization
Proceedings of the 2008 ACM symposium on Applied computing
SEPIA: estimating selectivities of approximate string predicates in large Databases
The VLDB Journal — The International Journal on Very Large Data Bases
XSelMark: A Micro-benchmark for Selectivity Estimation Approaches of XML Queries
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
EXsum: an XML summarization framework
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Synopsis based load shedding in XML streams
Proceedings of the 2009 EDBT/ICDT Workshops
Statistics-based parallelization of XPath queries in shared memory systems
Proceedings of the 13th International Conference on Extending Database Technology
Towards a comprehensive assessment for selectivity estimation approaches of XML queries
International Journal of Web Engineering and Technology
DMT: a flexible and versatile selectivity estimation approach for graph query
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A histogram-based selectivity estimator for skewed XML data
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Counting graph matches with adaptive statistics collection
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
A statistical approach for XML query size estimation
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Histograms as statistical estimators for aggregate queries
Information Systems
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
The extensible mark-up language (XML) is gaining widespread use as a format for data exchange and storage on the World Wide Web. Queries over XML data require accurate selectivity estimation of path expressions to optimize query execution plans. Selectivity estimation of XML path expression is usually done based on summary statistics about the structure of the underlying XML repository. All previous methods require an off-line scan of the XML repository to collect the statistics. In this paper, we propose XPathLearner, a method for estimating selectivity of the most commonly used types of path expressions without looking at the XML data. XPathLearner gathers and refines the statistics using query feedback in an on-line manner and is especially suited to queries in Internet scale applications since the underlying XML repository is either inaccessible or too large to be scanned in its entirety. Besides the on-line property, our method also has two other novel features: (a) XPathLearner is workload-aware in collecting the statistics and thus can be more accurate than the more costly off-line method under tight memory constraints, and (b) XPathLearner automatically adjusts the statistics using query feedback when the underlying XML data change. We show empirically the estimation accuracy of our method using several real data sets.