NeMa: fast graph search with label similarity

Authors:
Arijit Khan;Yinghui Wu;Charu C. Aggarwal;Xifeng Yan
Affiliations:
Computer Science, University of California, Santa Barbara;Computer Science, University of California, Santa Barbara;IBM T. J. Watson Research, Hawthorne, NY;Computer Science, University of California, Santa Barbara
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 33
Cited 1

Query rewriting for semistructured data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
Querying Semi-Structured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
The complexity of theorem-proving procedures

STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An adaptive information retrieval system based on associative networks

APCCM '04 Proceedings of the first Asian-Pacific conference on Conceptual modelling - Volume 31
A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs

IEEE Transactions on Pattern Analysis and Machine Intelligence
NetAlign: a web-based tool for comparison of protein interaction networks

Bioinformatics
CP/CV: concept similarity mining without frequency information from domain describing taxonomies

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
SAGA: a subgraph matching tool for biological graphs

Bioinformatics
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Fast best-effort pattern matching in large attributed graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Supporting ontology-based semantic matching in RDBMS

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
Propagation-vectors for trees (PVT): concise yet effective summaries for hierarchical data and trees

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
TALE: A Tool for Approximate Large Graph Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A gauss function based approach for unbalanced ontology matching

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Algorithms for Large, Sparse Network Alignment Problems

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Filtering for subgraph isomorphism

CP'07 Proceedings of the 13th international conference on Principles and practice of constraint programming
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining heterogeneous information networks by exploring the power of links

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Enhancements to high level data fusion using graph matching and state space search

Information Fusion
Graph pattern matching: from intractable to polynomial time

Proceedings of the VLDB Endowment
Graph homomorphism revisited for graph matching

Proceedings of the VLDB Endowment
SAPPER: subgraph indexing and approximate matching in large graphs

Proceedings of the VLDB Endowment
Querying graph patterns

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Neighborhood based fast graph search in large networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
gStore: answering SPARQL queries via subgraph matching

Proceedings of the VLDB Endowment
Keyword search in graphs: finding r-cliques

Proceedings of the VLDB Endowment
Capturing topology in graph pattern matching

Proceedings of the VLDB Endowment
Keyword Query Reformulation on Structured Data

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Efficient subgraph matching on billion node graphs

Proceedings of the VLDB Endowment
GoRelations: an intuitive query system for DBpedia

JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web

Strong simulation: Capturing topology in graph pattern matching

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is increasingly common to find real-life data represented as networks of labeled, heterogeneous entities. To query these networks, one often needs to identify the matches of a given query graph in a (typically large) network modeled as a target graph. Due to noise and the lack of fixed schema in the target graph, the query graph can substantially differ from its matches in the target graph in both structure and node labels, thus bringing challenges to the graph querying tasks. In this paper, we propose NeMa (Network Match), a neighborhood-based subgraph matching technique for querying real-life networks. (1) To measure the quality of the match, we propose a novel subgraph matching cost metric that aggregates the costs of matching individual nodes, and unifies both structure and node label similarities. (2) Based on the metric, we formulate the minimum cost subgraph matching problem. Given a query graph and a target graph, the problem is to identify the (top-k) matches of the query graph with minimum costs in the target graph. We show that the problem is NP-hard, and also hard to approximate. (3) We propose a heuristic algorithm for solving the problem based on an inference model. In addition, we propose optimization techniques to improve the efficiency of our method. (4) We empirically verify that NeMa is both effective and efficient compared to the keyword search and various state-of-the-art graph querying techniques.