Neighborhood based fast graph search in large networks

Authors:
Arijit Khan;Nan Li;Xifeng Yan;Ziyu Guan;Supriyo Chakraborty;Shu Tao
Affiliations:
University of California, Santa Barbara, SANTA BARBARA, CA, USA;University of California, Santa Barbara, SANTA BARBARA, CA, USA;University of California, Santa Barbara, SANTA BARBARA, CA, USA;University of California, Santa Barbara, Santa Barbara, CA, USA;University of California, Los Angeles, Los Angeles, CA, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 28
Cited 11

Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems

Journal of the ACM (JACM)
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
On External-Memory MST, SSSP, and Multi-way Planar Graph Separation

SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
The complexity of theorem-proving procedures

STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Cyclic pattern kernels for predictive graph mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Shortest-Path Kernels on Graphs

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
NetAlign: a web-based tool for comparison of protein interaction networks

Bioinformatics
SAGA: a subgraph matching tool for biological graphs

Bioinformatics
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Fast best-effort pattern matching in large attributed graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A novel spectral coding in a large graph database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
TALE: A Tool for Approximate Large Graph Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Algorithms for Large, Sparse Network Alignment Problems

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Towards proximity pattern mining in large graphs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks

ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Graph pattern matching: from intractable to polynomial time

Proceedings of the VLDB Endowment
Graph homomorphism revisited for graph matching

Proceedings of the VLDB Endowment
SAPPER: subgraph indexing and approximate matching in large graphs

Proceedings of the VLDB Endowment

Large-scale continuous subgraph queries on streams

Proceedings of the first annual workshop on High performance computing meets databases
Fast and exact top-k search for random walk with restart

Proceedings of the VLDB Endowment
Query-driven discovery of semantically similar substructures in heterogeneous networks

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
An in-depth comparison of subgraph isomorphism algorithms in graph databases

Proceedings of the VLDB Endowment
NeMa: fast graph search with label similarity

Proceedings of the VLDB Endowment
Efficient breadth-first search on large graphs with skewed degree distributions

Proceedings of the 16th International Conference on Extending Database Technology
A similarity measure for approximate querying over RDF data

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Graph similarity search with edit distance constraint in large graph databases

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient simrank-based similarity join over large graphs

Proceedings of the VLDB Endowment
Efficient processing of graph similarity queries with edit distance constraints

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex social and information network search becomes important with a variety of applications. In the core of these applications, lies a common and critical problem: Given a labeled network and a query graph, how to efficiently search the query graph in the target network. The presence of noise and the incomplete knowledge about the structure and content of the target network make it unrealistic to find an exact match. Rather, it is more appealing to find the top-k approximate matches. In this paper, we propose a neighborhood-based similarity measure that could avoid costly graph isomorphism and edit distance computation. Under this new measure, we prove that subgraph similarity search is NP hard, while graph similarity match is polynomial. By studying the principles behind this measure, we found an information propagation model that is able to convert a large network into a set of multidimensional vectors, where sophisticated indexing and similarity search algorithms are available. The proposed method, called Ness (Neighborhood Based Similarity Search), is appropriate for graphs with low automorphism and high noise, which are common in many social and information networks. Ness is not only efficient, but also robust against structural noise and information loss. Empirical results show that it can quickly and accurately find high-quality matches in large networks, with negligible cost.