Efficient processing of graph similarity queries with edit distance constraints

Authors:
Xiang Zhao;Chuan Xiao;Xuemin Lin;Wei Wang;Yoshiharu Ishikawa
Affiliations:
The University of New South Wales, Sydney, Australia and NICTA, Sydney, Australia;Nagoya University, Nagoya, Japan;The University of New South Wales, Sydney, Australia;The University of New South Wales, Sydney, Australia;Nagoya University, Nagoya, Japan
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2013

Citing 37
Cited 0

Matchings and extensions

Handbook of combinatorics (vol. 1)
A tight analysis of the greedy algorithm for set cover

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Modern Information Retrieval

Modern Information Retrieval
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Graph Edit Distance from Spectral Seriation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Similarity evaluation on tree-structured data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A Primitive Operator for Similarity Joins in Data Cleaning

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Binary Linear Programming Formulation of the Graph Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Towards graph containment search and indexing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints

Proceedings of the VLDB Endowment
TALE: A Tool for Approximate Large Graph Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient approximate entity extraction with edit distance constraints

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The pq-gram distance between ordered labeled trees

ACM Transactions on Database Systems (TODS)
Comparing stars: on approximating graph edit distance

Proceedings of the VLDB Endowment
A graph matching method and a graph matching distance based on subgraph assignments

Pattern Recognition Letters
Connected substructure similarity search

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Bed-tree: an all-purpose index structure for string similarity search based on edit distance

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SAPPER: subgraph indexing and approximate matching in large graphs

Proceedings of the VLDB Endowment
Trie-join: efficient trie-based string similarity joins with edit-distance constraints

Proceedings of the VLDB Endowment
Neighborhood based fast graph search in large networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient exact edit similarity query processing with the asymmetric signature scheme

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Speeding up graph edit distance computation through fast bipartite matching

GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Pass-join: a partition-based method for similarity joins

Proceedings of the VLDB Endowment
Efficiently Indexing Large Sparse Graphs for Similarity Search

IEEE Transactions on Knowledge and Data Engineering
Mining attribute-structure correlated patterns in large attributed graphs

Proceedings of the VLDB Endowment
TreeSpan: efficiently computing similarity all-matching

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Inexact graph matching for structural pattern recognition

Pattern Recognition Letters
An Efficient Graph Indexing Method

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Efficient Graph Similarity Joins with Edit Distance Constraints

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
VChunkJoin: An Efficient Algorithm for Edit Similarity Joins

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources such as erroneous data entries and find similarity matches. In this paper, we study graph similarity queries with edit distance constraints. Inspired by the $$q$$-gram idea for string similarity problems, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. Efficient algorithms are proposed to handle three types of graph similarity queries by exploiting both matching and mismatching features as well as degree information to improve the filtering and verification on candidates. We demonstrate the proposed algorithms significantly outperform existing approaches with extensive experiments on real and synthetic datasets.