Algorithmics and applications of tree and graph searching
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Statistical synopses for graph-structured XML databases
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
On Computing Condensed Frequent Pattern Bases
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Graph indexing: a frequent structure-based approach
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Frequent Substructure-Based Approaches for Classifying Chemical Compounds
IEEE Transactions on Knowledge and Data Engineering
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
The RDF-3X engine for scalable management of RDF data
The VLDB Journal — The International Journal on Very Large Data Bases
Database foundations for scalable RDF processing
RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Towards benefit-based RDF source selection for SPARQL queries
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Hi-index | 0.00 |
A fundamental problem related to graph structured databases is searching for substructures. One issue with respect to optimizing such searches is the ability to estimate the frequency of substructures within a query graph. In this work, we present and evaluate two techniques for estimating the frequency of subgraphs from a summary of the data graph. In the first technique, we assume that edge occurrences on edge sequences are position independent and summarize only the most informative dependencies. In the second technique, we prune small subgraphs using a valuation scheme that blends information about their importance and estimation power. In both techniques, we assume conditional independence to estimate the frequencies of larger subgraphs. We validate the effectiveness of our techniques through experiments on real and synthetic datasets.