GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space

Authors:
Huahai He;Ambuj K. Singh
Affiliations:
University of California, Santa Barbara, USA;University of California, Santa Barbara, USA
Venue:
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Year:
2006

Citing 0
Cited 5

Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Indexing and mining topological patterns for drug discovery

Proceedings of the 15th International Conference on Extending Database Technology
Significant motifs in time series

Statistical Analysis and Data Mining
Extraction of statistically significant malware behaviors

Proceedings of the 29th Annual Computer Security Applications Conference
A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a technique for evaluating the statistical significance of frequent subgraphs in a database. A graph is represented by a feature vector that is a histogram over a set of basis elements. The set of basis elements is chosen based on domain knowledge and consists generally of vertices, edges, or small graphs. A given subgraph is transformed to a feature vector and the significance of the subgraph is computed by considering the significance of occurrence of the corresponding vector. The probability of occurrence of the vector in a random vector is computed based on the prior probability of the basis elements. This is then used to obtain a probability distribution on the support of the vector in a database of random vectors. The statistical significance of the vector/subgraph is then defined as the p-value of its observed support. We develop efficient methods for computing p-values and lower bounds. A simplified model is further proposed to improve the efficiency. We also address the problem of feature vector mining, a generalization of itemset mining where counts are associated with items and the goal is to find significant sub-vectors. We present an algorithm that explores closed frequent sub-vectors to find significant ones. Experimental results show that the proposed techniques are effective, efficient, and useful for ranking frequent subgraphs by their statistical significance.