Three partition refinement algorithms
SIAM Journal on Computing
Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The XML benchmark project
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
D(k)-index: an adaptive structural summary for graph-structured data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Containment join size estimation: models and methods
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XQuery speedup using replication in mapping XML into relations
Proceedings of the 2003 ACM symposium on Applied computing
Building XML statistics for the hidden web
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Incremental maintenance of XML structural indexes
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Semantic Similarity Search on Semistructured Data with the XXL Search Engine
Information Retrieval
XML stream processing using tree-edit distance embeddings
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Indexing Useful Structural Patterns for XML Query Processing
IEEE Transactions on Knowledge and Data Engineering
CXHist: an on-line classification-based histogram for XML string selectivity estimation
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic querying of tree-structured data sources using partially specified tree patterns
Proceedings of the 14th ACM international conference on Information and knowledge management
A methodology for clustering XML documents by structure
Information Systems
Cost-based optimization in DB2 XML
IBM Systems Journal
Heuristic containment check of partial tree-pattern queries in the presence of index graphs
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structure and value synopses for XML data graphs
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Bloom histogram: path selectivity estimation for XML data with updates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Enabling structural summaries for efficient update and workload adaptation
Data & Knowledge Engineering
On graph modelling, node ranking and visualisation
International Journal of Intelligent Systems Technologies and Applications
Temporal XML: modeling, indexing, and query processing
The VLDB Journal — The International Journal on Very Large Data Bases
A relational model for XML structural joins and their size estimations
Knowledge and Information Systems
XML Retrieval by Improving Structural Relevance Measures Obtained from Summary Models
Focused Access to XML Documents
Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
A sampling approach for XML query selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query Optimization for Complex Path Queries on XML Data
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Containment of partially specified tree-pattern queries in the presence of dimension graphs
The VLDB Journal — The International Journal on Very Large Data Bases
Corpus-based knowledge representation
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A methodology for clustering XML documents by structure
Information Systems
Statistics-based parallelization of XPath queries in shared memory systems
Proceedings of the 13th International Conference on Extending Database Technology
XML query result size estimation for small bandwidth devices
BNCOD'07 Proceedings of the 24th British national conference on Databases
Graph summaries for subgraph frequency estimation
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Towards efficient subgraph search in cloud computing environments
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Adaptively indexing dynamic XML
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
DMT: a flexible and versatile selectivity estimation approach for graph query
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A histogram-based selectivity estimator for skewed XML data
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Counting graph matches with adaptive statistics collection
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Semantic integration of tree-structured data using dimension graphs
Journal on Data Semantics IV
A statistical approach for XML query size estimation
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
What's next in XML and databases?
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Applying cosine series to XML structural join size estimation
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
A quantitative summary of XML structures
ER'06 Proceedings of the 25th international conference on Conceptual Modeling
Top-K data source selection for keyword queries over multiple XML data sources
Journal of Information Science
Browsing and visualizing digital bibliographic data
VISSYM'04 Proceedings of the Sixth Joint Eurographics - IEEE TCVG conference on Visualization
Efficient fragmentation of large XML documents
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Realtime analysis of information diffusion in social media
Proceedings of the VLDB Endowment
Spelling Suggestion for XML Keyword Search Based on XSketch Synopsis
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
Effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows path navigation and branching through the XML data graph in order to reach the desired data elements. Optimizing such queries depends crucially on the existence of concise synopsis structures that enable accurate compile-time selectivity estimates for complex path expressions over graph-structured XML data. In this paper, We introduce a novel approach to building and using statistical summaries of large XML data graphs for effective path-expression selectivity estimation. Our proposed graph-synopsis model (termed XSKETCH) exploits localized graph stability to accurately approximate (in limited space) the path and branching distribution in the data graph. To estimate the selectivities of complex path expressions over concise XSKETCH synopses, we develop an estimation framework that relies on appropriate statistical (uniformity and independence) assumptions to compensate for the lack of detailed distribution information. Given our estimation framework, we demonstrate that the problem of building an accuracy-optimal XSKETCH for a given amount of space is 𝒩𝒫-hard, and propose an efficient heuristic algorithm based on greedy forward selection. Briefly, our algorithm constructs an XSKETCH synopsis by successive refinements of the label-split graph, the coarsest summary of the XML data graph. Our refinement operations act locally and attempt to capture important statistical correlations between data paths. Extensive experimental results with synthetic as well as real-life data sets verify the effectiveness of our approach. To the best of our knowledge, ours is the first work to address this timely problem in the most general setting of graph-structured data and complex (branching) path expressions.