Graph summaries for subgraph frequency estimation

Authors:
Angela Maduko;Kemafor Anyanwu;Amit Sheth;Paul Schliekelman
Affiliations:
Department of Computer Science, University of Georgia;Department of Computer Science, North Carolina State University;Kno.e.sis Center, Wright State University;Department of Statistics, University of Georgia
Venue:
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Year:
2008

Citing 10
Cited 4

Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
On Computing Condensed Frequent Pattern Bases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Selectivity Estimation for XML Twigs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Frequent Substructure-Based Approaches for Classifying Chemical Compounds

IEEE Transactions on Knowledge and Data Engineering
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Towards benefit-based RDF source selection for SPARQL queries

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fundamental problem related to graph structured databases is searching for substructures. One issue with respect to optimizing such searches is the ability to estimate the frequency of substructures within a query graph. In this work, we present and evaluate two techniques for estimating the frequency of subgraphs from a summary of the data graph. In the first technique, we assume that edge occurrences on edge sequences are position independent and summarize only the most informative dependencies. In the second technique, we prune small subgraphs using a valuation scheme that blends information about their importance and estimation power. In both techniques, we assume conditional independence to estimate the frequencies of larger subgraphs. We validate the effectiveness of our techniques through experiments on real and synthetic datasets.