Towards graph containment search and indexing

Authors:
Chen Chen;Xifeng Yan;Philip S. Yu;Jiawei Han;Dong-Qing Zhang;Xiaohui Gu
Affiliations:
University of Illinois at Urbana-Champaign;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;University of Illinois at Urbana-Champaign;Thomson Research;IBM T. J. Watson Research Center
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 22
Cited 18

A Step Towards Unification of Syntactic and Statistical Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence - Special memorial issue for Professor King-Sun Fu
Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
A Graduated Assignment Algorithm for Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Saliency, Scale and Image Description

International Journal of Computer Vision
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Gigabit Rate Packet Pattern-Matching Using TCAM

ICNP '04 Proceedings of the 12th IEEE International Conference on Network Protocols
Detecting image near-duplicate by stochastic attributed relational graph matching with learning

Proceedings of the 12th annual ACM international conference on Multimedia
G-ToPSS: fast filtering of graph-based metadata

WWW '05 Proceedings of the 14th international conference on World Wide Web
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ViCo: an adaptive distributed video correlation system

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
SAGA: a subgraph matching tool for biological graphs

Bioinformatics
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Pairwise local alignment of protein interaction networks guided by models of evolution

RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology

A novel approach for efficient supergraph query processing on graph databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Comparing stars: on approximating graph edit distance

Proceedings of the VLDB Endowment
Connected substructure similarity search

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An efficient features-based processing technique for supergraph queries

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
PrefIndex: an efficient supergraph containment search technique

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
DSI: a method for indexing large graphs using distance set

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Efficient algorithms for supergraph query processing on graph databases

Journal of Combinatorial Optimization
Fast graph query processing with a low-cost index

The VLDB Journal — The International Journal on Very Large Data Bases
Querying large graph databases

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
TreeSpan: efficiently computing similarity all-matching

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Finding top-k similar graphs in graph databases

Proceedings of the 15th International Conference on Extending Database Technology
Efficient subgraph similarity all-matching

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
RDF pattern matching using sortable views

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient algorithms for generalized subgraph query processing

Proceedings of the 21st ACM international conference on Information and knowledge management
Lindex: a lattice-based index for graph databases

The VLDB Journal — The International Journal on Very Large Data Bases
Mining and indexing graphs for supergraph search

Proceedings of the VLDB Endowment
Hybrid query execution engine for large attributed graphs

Information Systems
Efficient processing of graph similarity queries with edit distance constraints

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a set of model graphs D and a query graph q, containment search aims to find all model graphs g ε D such that q contains g (q ⊇ g). Due to the wide adoption of graph models, fast containment search of graph data finds many applications in various domains. In comparison to traditional graph search that retrieves all the graphs containing q (q ⊆ g), containment search has its own indexing characteristics that have not yet been examined. In this paper, we perform a systematic study on these characteristics and propose a contrast subgraph-based indexing model, called cIndex. Contrast subgraphs capture the structure differences between model graphs and query graphs, and are thus perfect for indexing due to their high selectivity. Using a redundancy-aware feature selection process, cIndex can sort out a set of significant and distinctive contrast subgraphs and maximize its indexing capability. We show that it is NP-complete to choose the best set of indexing features, and our greedy algorithm can approximate the one-level optimal index within a ratio of 1-- 1/e. Taking this solution as a base indexing model, we further extend it to accommodate hierarchical indexing methodologies and apply data space clustering and sampling techniques to reduce the index construction time. The proposed methodology provides a general solution to containment search and indexing, not only for graphs, but also for any data with transitive relations as well. Experimental results on real test data show that cIndex achieves near-optimal pruning power on various containment search workloads, and confirms its obvious advantage over indices built for traditional graph search in this new scenario.