Efficient query processing on graph databases

Authors:
James Cheng;Yiping Ke;Wilfred Ng
Affiliations:
Nanyang Technological University, Singapore;The Chinese University of Hong Kong, New Territories, Hong Kong;The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2009

Citing 31
Cited 9

Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
GraphDB: Modeling and Querying Graphs in Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Issues in data stream management

ACM SIGMOD Record
The complexity of theorem-proving procedures

STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining protein family specific residue packing patterns from protein structure graphs

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Fast discovery of connection subgraphs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Graph indexing based on discriminative frequent structure analysis

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Measuring and extracting proximity in networks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Center-piece subgraphs: problem definition and fast solutions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
FIX: feature-based indexing technique for XML documents

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
\delta-Tolerance Closed Frequent Itemsets

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Correlation search in graph databases

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast best-effort pattern matching in large attributed graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A platform based on the multi-dimensional data modal for analysis of bio-molecular structures

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Effective elimination of redundant association rules

Data Mining and Knowledge Discovery
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Maintaining frequent closed itemsets over a sliding window

Journal of Intelligent Information Systems
Efficient Correlation Search from Graph Databases

IEEE Transactions on Knowledge and Data Engineering
An efficient index lattice for XML query evaluation

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications

Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism

Journal of Experimental Algorithmics (JEA)
A tool for fast indexing and querying of graphs

Proceedings of the 20th international conference companion on World wide web
Structure and attribute index for approximate graph matching in large graphs

Information Systems
EGDIM: evolving graph database indexing method

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Compressed feature-based filtering and verification approach for subgraph search

Proceedings of the 16th International Conference on Extending Database Technology
Lindex: a lattice-based index for graph databases

The VLDB Journal — The International Journal on Very Large Data Bases
Mining and indexing graphs for supergraph search

Proceedings of the VLDB Endowment
Efficient Multiview Maintenance under Insertion in Huge Social Networks

ACM Transactions on the Web (TWEB)
Querying business process model repositories

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*-index, to solve this problem. The cost of processing a subgraph query using most existing indexes mainly consists of two parts: the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*-index to minimize these two costs as follows. FG*-index consists of three components: the FG-index, the feature-index, and the FAQ-index. First, the FG-index employs the concept of Frequent subGraph (FG) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FG-queries. We can enlarge the set of FG-queries so that more queries can be answered without candidate verification; however, a larger set of FG-queries implies a larger FG-index and hence the index probing cost also increases. We propose the feature-index to reduce the index probing cost. The feature-index uses features to filter false results that are matched in the FG-index, so that we can quickly find the truly matching graphs for a query. For processing non-FG-queries, we propose the FAQ-index, which is dynamically constructed from the set of Frequently Asked non-FG-Queries (FAQs). Using the FAQ-index, verification is not required for processing FAQs and only a small number of candidates need to be verified for processing non-FG-queries that are not frequently asked. Finally, a comprehensive set of experiments verifies that query processing using FG*-index is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable.