GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases

Authors:
Sayan Ranu;Ambuj K. Singh
Affiliations:
-;-
Venue:
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Year:
2009

Citing 0
Cited 11

GAIA: graph classification using evolutionary computation

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On dense pattern mining in graph streams

Proceedings of the VLDB Endowment
Classifying graphs using theoretical metrics: a study of feasibility

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Learning from graph data by putting graphs on the lattice

Expert Systems with Applications: An International Journal
Indexing and mining topological patterns for drug discovery

Proceedings of the 15th International Conference on Extending Database Technology
Semi-supervised clustering of graph objects: a subgraph mining approach

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Graph classification: a diversified discriminative feature selection approach

Proceedings of the 21st ACM international conference on Information and knowledge management
Mining discriminative subgraphs from global-state networks

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Extraction of statistically significant malware behaviors

Proceedings of the 29th Annual Computer Security Applications Conference
Horton+: a distributed system for processing declarative reachability queries over partitioned graphs

Proceedings of the VLDB Endowment
Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship QSAR modeling

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs are being increasingly used to model a wide range of scientific data. Such widespread usage of graphs has generated considerable interest in mining patterns from graph databases. While an array of techniques exists to mine frequent patterns, we still lack a scalable approach to mine statistically significant patterns, specifically patterns with low p-values, that occur at low frequencies. We propose a highly scalable technique, called GraphSig, to mine significant subgraphs from large graph databases. We convert each graph into a set of feature vectors where each vector represents a region within the graph. Domain knowledge is used to select a meaningful feature set. Prior probabilities of features are computed empirically to evaluate statistical significance of patterns in the feature space. Following analysis in the feature space, only a small portion of the exponential search space is accessed for further analysis. This enables the use of existing frequent subgraph mining techniques to mine significant patterns in a scalable manner even when they are infrequent. Extensive experiments are carried out on the proposed techniques, and empirical results demonstrate that GraphSig is effective and efficient for mining significant patterns. To further demonstrate the power of significant patterns, we develop a classifier using patterns mined by GraphSig. Experimental results show that the proposed classifier achieves superior performance, both in terms of quality and computation cost, over state-of-the-art classifiers.