The complexity of mining maximal frequent subgraphs

Authors:
Benny Kimelfeld;Phokion G. Kolaitis
Affiliations:
IBM Research - Almaden, San Jose, CA, USA;UC Santa Cruz and IBM Research - Almaden, Santa Cruz, CA, USA
Venue:
Proceedings of the 32nd symposium on Principles of database systems
Year:
2013

Citing 30
Cited 0

On generating all maximal independent sets

Information Processing Letters
Counting classes are at least as hard as the polynomial-time hierarchy

SIAM Journal on Computing
On the complexity of finding iso- and other morphisms for partial k-trees

Discrete Mathematics - Topological, algebraical and combinatorial structures; Froli´k's memorial volume
Interior and exterior functions of Boolean functions

Discrete Applied Mathematics
Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
Canonical labeling of graphs

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices

Annals of Mathematics and Artificial Intelligence
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The complexity of mining maximal frequent itemsets and maximal frequent patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
Mining and Reasoning on Workflows

IEEE Transactions on Knowledge and Data Engineering
Frequent Substructure-Based Approaches for Classifying Chemical Compounds

IEEE Transactions on Knowledge and Data Engineering
Finding and approximating top-k answers in keyword proximity search

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Discovering Frequent Graph Patterns Using Disjoint Paths

IEEE Transactions on Knowledge and Data Engineering
Mining unconnected patterns in workflows

Information Systems
Maximally joining probabilistic data

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Algorithms for acyclic database schemes

VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
Generating All Vertices of a Polyhedron Is Hard

Discrete & Computational Geometry
Counting the number of independent sets in chordal graphs

Journal of Discrete Algorithms
Every Monotone Graph Property Is Testable

SIAM Journal on Computing
Structure of Neighborhoods in a Large Social Network

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04
MARGIN: Maximal frequent subgraph mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
Graphs of bounded treewidth can be canonized in AC1

CSR'11 Proceedings of the 6th international conference on Computer science: theory and applications
Parameterized Complexity

Parameterized Complexity

Quantified Score

Hi-index	0.00

Visualization

Abstract

A frequent subgraph of a given collection of graphs is a graph that is isomorphic to a subgraph of at least as many graphs in the collection as a given threshold. Frequent subgraphs generalize frequent itemsets and arise in various contexts, from bioinformatics to the Web. Since the space of frequent subgraphs is typically extremely large, research in graph mining has focused on special types of frequent subgraphs that can be orders of magnitude smaller in number, yet encapsulate the space of all frequent subgraphs. Maximal frequent subgraphs (i.e., the ones not properly contained in any frequent subgraph) constitute the most useful such type. In this paper, we embark on a comprehensive investigation of the computational complexity of mining maximal frequent subgraphs. Our study is carried out by considering the effect of three different parameters: possible restrictions on the class of graphs; a fixed bound on the threshold; and a fixed bound on the number of desired answers. We focus on specific classes of connected graphs: general graphs, planar graphs, graphs of bounded degree, and graphs of bounded tree-width (trees being a special case). Moreover, each class has two variants: the one in which the nodes are unlabeled, and the one in which they are uniquely labeled. We delineate the complexity of the enumeration problem for each of these variants by determining when it is solvable in (total or incremental) polynomial time and when it is NP-hard. Specifically, for the labeled classes, we show that bounding the threshold yields tractability but, in most cases, bounding the number of answers does not, unless P=NP; an exception is the case of labeled trees, where bounding either of these two parameters yields tractability. The state of affairs turns out to be quite different for the unlabeled classes. The main (and most challenging to prove) result concerns unlabeled trees: we show NP-hardness, even if the input consists of two trees, and both the threshold and the number of desired answers are equal to just two. In other words, we establish that the following problem is NP-complete: given two unlabeled trees, do they have more than one maximal subtree in common?