Three partition refinement algorithms
SIAM Journal on Computing
Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A query language and optimization techniques for unstructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Covering indexes for branching path queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Containment join size estimation: models and methods
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Building XML statistics for the hidden web
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
XSEED: Accurate and Fast Cardinality Estimation for XPath Queries
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
XCluster Synopses for Structured XML Content
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Bloom histogram: path selectivity estimation for XML data with updates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Graph summarization with bounded error
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Structural summaries for efficient XML query processing
Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
A cost-based join selection for XML twig content-based queries
DataX '08 Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web
Dependable cardinality forecasts for XQuery
Proceedings of the VLDB Endowment
Retrieving XML data from heterogeneous sources through vague querying
ACM Transactions on Internet Technology (TOIT)
Refining Keyword Queries for XML Retrieval by Combining Content and Structure
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Top-k Answers to Fuzzy XPath Queries
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Mining graph patterns efficiently via randomized summaries
Proceedings of the VLDB Endowment
GConnect: a connectivity index for massive disk-resident graphs
Proceedings of the VLDB Endowment
Exploring XML web collections with DescribeX
ACM Transactions on the Web (TWEB)
Efficient query answering in probabilistic RDF graphs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Optimizing incremental maintenance of minimal bisimulation of cyclic graphs
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
CP-index: on the efficient indexing of large graphs
Proceedings of the 20th ACM international conference on Information and knowledge management
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Fast answering of XPath query workloads on web collections
XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Vague queries on peer-to-peer XML databases
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Histograms as statistical estimators for aggregate queries
Information Systems
Algebraic incremental maintenance of XML views
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation and branching through the label structure of the XML data graph, and (2) predicates on the values of specific path/branch nodes, in order to reach the desired data elements. Clearly, optimizing such queries requires approximating the result cardinality of the referenced paths and hence hinges on the existence of concise synopsis structures that enable accurate compile-time selectivity estimates for complex path expressions over the base XML data. In this article, we introduce a novel approach to building and using statistical summaries of large XML data graphs for effective path-expression selectivity estimation. Our proposed graph-synopsis model (termed XSketch) exploits localized graph stability and value-distribution summaries (e.g., histograms) to accurately approximate (in limited space) the path and branching distribution, as well as the complex correlation patterns that can exist between and across path structure and element values in the data graph. To the best of our knowledge, ours is the first work to address this timely problem in the most general setting of graph-structured XML data with values, and complex (branching) path expressions.