XSKETCH synopses for XML data graphs

Authors:
Neoklis Polyzotis;Minos Garofalakis
Affiliations:
University of California, Santa Cruz, Santa Cruz, CA;Intel Research Berkeley, Berkeley, CA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2006

Citing 29
Cited 18

Three partition refinement algorithms

SIAM Journal on Computing
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A query language and optimization techniques for unstructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
StatiX: making XML count

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Building XML statistics for the hidden web

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Selectivity Estimation for XML Twigs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Approximate XML query answers

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
XSEED: Accurate and Fast Cardinality Estimation for XPath Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
XCluster Synopses for Structured XML Content

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Graph summarization with bounded error

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Structural summaries for efficient XML query processing

Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
A cost-based join selection for XML twig content-based queries

DataX '08 Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web
Dependable cardinality forecasts for XQuery

Proceedings of the VLDB Endowment
Retrieving XML data from heterogeneous sources through vague querying

ACM Transactions on Internet Technology (TOIT)
Refining Keyword Queries for XML Retrieval by Combining Content and Structure

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Top-k Answers to Fuzzy XPath Queries

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment
GConnect: a connectivity index for massive disk-resident graphs

Proceedings of the VLDB Endowment
Exploring XML web collections with DescribeX

ACM Transactions on the Web (TWEB)
Efficient query answering in probabilistic RDF graphs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Optimizing incremental maintenance of minimal bisimulation of cyclic graphs

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
CP-index: on the efficient indexing of large graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Fast answering of XPath query workloads on web collections

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Vague queries on peer-to-peer XML databases

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Histograms as statistical estimators for aggregate queries

Information Systems
Algebraic incremental maintenance of XML views

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation and branching through the label structure of the XML data graph, and (2) predicates on the values of specific path/branch nodes, in order to reach the desired data elements. Clearly, optimizing such queries requires approximating the result cardinality of the referenced paths and hence hinges on the existence of concise synopsis structures that enable accurate compile-time selectivity estimates for complex path expressions over the base XML data. In this article, we introduce a novel approach to building and using statistical summaries of large XML data graphs for effective path-expression selectivity estimation. Our proposed graph-synopsis model (termed XSketch) exploits localized graph stability and value-distribution summaries (e.g., histograms) to accurately approximate (in limited space) the path and branching distribution, as well as the complex correlation patterns that can exist between and across path structure and element values in the data graph. To the best of our knowledge, ours is the first work to address this timely problem in the most general setting of graph-structured XML data with values, and complex (branching) path expressions.