Three partition refinement algorithms
SIAM Journal on Computing
Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
A query language and optimization techniques for unstructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
D(k)-index: an adaptive structural summary for graph-structured data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Building XML statistics for the hidden web
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Semantic Similarity Search on Semistructured Data with the XXL Search Engine
Information Retrieval
CXHist: an on-line classification-based histogram for XML string selectivity estimation
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Peer-to-peer management of XML data: issues and research challenges
ACM SIGMOD Record
Compact reachability labeling for graph-structured data
Proceedings of the 14th ACM international conference on Information and knowledge management
Usage-Based PageRank for Web Personalization
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Cost-based optimization in DB2 XML
IBM Systems Journal
Indexing graph-structured XML data for efficient structural join operation
Data & Knowledge Engineering
Web site personalization based on link analysis and navigational patterns
ACM Transactions on Internet Technology (TOIT)
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Complex queries over web repositories
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Mixed mode XML query processing
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Locating data sources in large distributed systems
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Bloom histogram: path selectivity estimation for XML data with updates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Enabling structural summaries for efficient update and workload adaptation
Data & Knowledge Engineering
Accurate histogram-based XML summarization
Proceedings of the 2008 ACM symposium on Applied computing
Temporal XML: modeling, indexing, and query processing
The VLDB Journal — The International Journal on Very Large Data Bases
A cost-based join selection for XML twig content-based queries
DataX '08 Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web
Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
EXsum: an XML summarization framework
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Hash-base subgraph query processing method for graph-structured XML documents
Proceedings of the VLDB Endowment
A sampling approach for XML query selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query Optimization for Complex Path Queries on XML Data
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Using Structural Joins and Holistic Twig Joins for Native XML Query Optimization
ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Semantic clustering of XML documents
ACM Transactions on Information Systems (TOIS)
Synopsis based load shedding in XML streams
Proceedings of the 2009 EDBT/ICDT Workshops
Statistics-based parallelization of XPath queries in shared memory systems
Proceedings of the 13th International Conference on Extending Database Technology
Suggestion of promising result types for XML keyword search
Proceedings of the 13th International Conference on Extending Database Technology
XML query result size estimation for small bandwidth devices
BNCOD'07 Proceedings of the 24th British national conference on Databases
Query and update through XML views
DNIS'07 Proceedings of the 5th international conference on Databases in networked information systems
XML query routing in structured P2P systems
DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Towards a comprehensive assessment for selectivity estimation approaches of XML queries
International Journal of Web Engineering and Technology
Collaborative clustering of XML documents
Journal of Computer and System Sciences
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Query planning in the presence of overlapping sources
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
What's next in XML and databases?
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Top-K data source selection for keyword queries over multiple XML data sources
Journal of Information Science
Exploring dictionary-based semantic relatedness in labeled tree data
Information Sciences: an International Journal
Hi-index | 0.00 |
All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation and branching through the label structure of the XML data graph, and (2) predicates on the values of specific path/branch nodes, in order to reach the desired data elements. Optimizing such queries depends crucially on the existence of concise synopsis structures that enable accurate compile-time selectivity estimates for complex path expressions over graph-structured XML data. In this paper, we extent our earlier work on structural XSKETCH synopses and we propose an (augmented) XSKETCH synopsis model that exploits localized stability and value-distribution summaries (e.g., histograms) to accurately capture the complex correlation patterns that can exist between and across path structure and element values in the data graph. We develop a systematic XSKETCH estimation framework for complex path expressions with value predicates and we propose an efficient heuristic algorithm based on greedy forward selection for building an effective XSKETCH for a given amount of space (which is, in general, an NP-hard optimization problem). Implementation results with both synthetic and real-life data sets verify the effectiveness of our approach.