TreeSpan: efficiently computing similarity all-matching

Authors:
Gaoping Zhu;Xuemin Lin;Ke Zhu;Wenjie Zhang;Jeffrey Xu Yu
Affiliations:
Univeristy of New South Wales, Sydney, Australia;Univeristy of New South Wales, Sydney, Australia;Univeristy of New South Wales, Sydney, Australia;Univeristy of New South Wales, Sydney, Australia;Chinese University of Hong Kong, Hong Kong, China
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 20
Cited 2

Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data Structures and Algorithms

Data Structures and Algorithms
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards graph containment search and indexing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A novel approach for efficient supergraph query processing on graph databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
TALE: A Tool for Approximate Large Graph Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Connected substructure similarity search

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On graph query optimization in large networks

Proceedings of the VLDB Endowment
SAPPER: subgraph indexing and approximate matching in large graphs

Proceedings of the VLDB Endowment
On triangulation-based dense neighborhood graph discovery

Proceedings of the VLDB Endowment
gStore: answering SPARQL queries via subgraph matching

Proceedings of the VLDB Endowment
CT-index: Fingerprint-based graph indexing combining cycles and trees

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

QUBLE: blending visual subgraph query formulation with query processing on large networks

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Efficient processing of graph similarity queries with edit distance constraints

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a query graph $q$ and a data graph G, computing all occurrences of q in G, namely exact all-matching, is fundamental in graph data analysis with a wide spectrum of real applications. It is challenging since even finding one occurrence of q in G (subgraph isomorphism test) is NP-Complete. Consider that in many real applications, exploratory queries from users are often inaccurate to express their real demands. In this paper, we study the problem of efficiently computing all approximate occurrences of q in G. Particularly, we study the problem of efficiently retrieving all matches of q in G with the number of possible missing edges bounded by a given threshold θ, namely similarity all-matching. The problem of similarity all-matching is harder than the problem of exact all-matching since it covers the problem of exact all-matching as a special case with θ = 0. In this paper, we develop a novel paradigm to conduct similarity all-matching. Specifically, we propose to use a minimal set QT of spanning trees in q to cover all connected subgraphs q' of q missing at most θ edges; that is, each q' is spanned by a spanning tree in QT. Then, we conduct exact all-matching for each spanning tree in QT to induce all similarity matches. A rigid theoretic analysis shows that our new search paradigm significantly reduces the times of conducting exact all-matching against the existing techniques. To further speed-up the computation, we develop new filtering, computation sharing, and search ordering techniques. Our comprehensive experiments on both real and synthetic datasets demonstrate that our techniques outperform the state of the art technique by 7 orders of magnitude.