Algorithmics and applications of tree and graph searching
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data Structures and Algorithms
Data Structures and Algorithms
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Graph indexing: a frequent structure-based approach
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A quickstart in frequent structure mining can make a difference
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Substructure similarity search in graph databases
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Closure-Tree: An Index Structure for Graph Queries
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Fg-index: towards verification-free query processing on graph databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards graph containment search and indexing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism
Proceedings of the VLDB Endowment
GADDI: distance index based subgraph matching in biological networks
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A novel approach for efficient supergraph query processing on graph databases
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
TALE: A Tool for Approximate Large Graph Matching
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Connected substructure similarity search
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On graph query optimization in large networks
Proceedings of the VLDB Endowment
SAPPER: subgraph indexing and approximate matching in large graphs
Proceedings of the VLDB Endowment
On triangulation-based dense neighborhood graph discovery
Proceedings of the VLDB Endowment
gStore: answering SPARQL queries via subgraph matching
Proceedings of the VLDB Endowment
CT-index: Fingerprint-based graph indexing combining cycles and trees
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
QUBLE: blending visual subgraph query formulation with query processing on large networks
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Efficient processing of graph similarity queries with edit distance constraints
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Given a query graph $q$ and a data graph G, computing all occurrences of q in G, namely exact all-matching, is fundamental in graph data analysis with a wide spectrum of real applications. It is challenging since even finding one occurrence of q in G (subgraph isomorphism test) is NP-Complete. Consider that in many real applications, exploratory queries from users are often inaccurate to express their real demands. In this paper, we study the problem of efficiently computing all approximate occurrences of q in G. Particularly, we study the problem of efficiently retrieving all matches of q in G with the number of possible missing edges bounded by a given threshold θ, namely similarity all-matching. The problem of similarity all-matching is harder than the problem of exact all-matching since it covers the problem of exact all-matching as a special case with θ = 0. In this paper, we develop a novel paradigm to conduct similarity all-matching. Specifically, we propose to use a minimal set QT of spanning trees in q to cover all connected subgraphs q' of q missing at most θ edges; that is, each q' is spanned by a spanning tree in QT. Then, we conduct exact all-matching for each spanning tree in QT to induce all similarity matches. A rigid theoretic analysis shows that our new search paradigm significantly reduces the times of conducting exact all-matching against the existing techniques. To further speed-up the computation, we develop new filtering, computation sharing, and search ordering techniques. Our comprehensive experiments on both real and synthetic datasets demonstrate that our techniques outperform the state of the art technique by 7 orders of magnitude.