SPIN: mining maximal frequent subgraphs from graph databases

Authors:
Jun Huan;Wei Wang;Jan Prins;Jiong Yang
Affiliations:
University of North Carolina at Chapel Hill, Chapel Hill, NC;University of North Carolina at Chapel Hill, Chapel Hill, NC;University of North Carolina at Chapel Hill, Chapel Hill, NC;University of Illinois, Urbana-Champaign, IL
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 16
Cited 48

Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On Computing Condensed Frequent Pattern Bases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Computing Frequent Graph Patterns from Semistructured Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Indexing and Mining Free Trees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining protein family specific residue packing patterns from protein structure graphs

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Mining Frequent Labeled and Partially Labeled Graph Patterns

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Discovering frequent topological structures from graph datasets

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Discovering informative connection subgraphs in multi-relational graphs

ACM SIGKDD Explorations Newsletter
NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Finding what's not there: a new approach to revealing neglected conditions in software

Proceedings of the 2007 international symposium on Software testing and analysis
Mining complex power networks for blackout prevention

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms

IEEE Transactions on Knowledge and Data Engineering
Taxonomy-superimposed graph mining

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Efficient aggregation for graph summarization

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
RAM: Randomized Approximate Graph Mining

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
On effective presentation of graph patterns: a structural representative approach

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient query processing on graph databases

ACM Transactions on Database Systems (TODS)
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
FOGGER: an algorithm for graph generator discovery

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Frequent subgraph pattern mining on uncertain graph data

Proceedings of the 18th ACM conference on Information and knowledge management
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment
gPrune: a constraint pushing framework for graph pattern mining

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
JPMiner: mining frequent jump patterns from graph databases

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
K-isomorphism: privacy preserving network publication against structural attacks

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Towards proximity pattern mining in large graphs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
GAIA: graph classification using evolutionary computation

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining structured data

IEEE Computational Intelligence Magazine
Pruthak: mining and analyzing graph substructures

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
MARGIN: Maximal frequent subgraph mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
DESSIN: mining dense subgraph patterns in a single graph

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Efficient discovery of frequent subgraph patterns in uncertain graph databases

Proceedings of the 14th International Conference on Extending Database Technology
Structure and attribute index for approximate graph matching in large graphs

Information Systems
From connected frequent graphs to unconnected frequent graphs

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Fast graph query processing with a low-cost index

The VLDB Journal — The International Journal on Very Large Data Bases
Finding itemset-sharing patterns in a large itemset-associated graph

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Indexing and mining of graph database based on interconnected subgraph

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
EGDIM: evolving graph database indexing method

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Review of bisonet abstraction techniques

Bisociative Knowledge Discovery
Graph classification: a diversified discriminative feature selection approach

Proceedings of the 21st ACM international conference on Information and knowledge management
Nearly exact mining of frequent trees in large networks

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

The VLDB Journal — The International Journal on Very Large Data Bases
NODAR: mining globally distributed substructures from a single labeled graph

Journal of Intelligent Information Systems
A Pattern Language for Knowledge Discovery in a Semantic Web context

International Journal of Information Technology and Web Engineering
Graph-Based Modelling of Concurrent Sequential Patterns

International Journal of Data Warehousing and Mining
A direct mining approach to efficient constrained graph pattern discovery

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The complexity of mining maximal frequent subgraphs

Proceedings of the 32nd symposium on Principles of database systems
Finding the most descriptive substructures in graphs with discrete and numeric labels

NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns
Out-of-bag discriminative graph mining

Proceedings of the 28th Annual ACM Symposium on Applied Computing
A multiobjective evolutionary programming framework for graph-based data mining

Information Sciences: an International Journal
Frequent subgraph summarization with error control

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship QSAR modeling

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

One fundamental challenge for mining recurring subgraphs from semi-structured data sets is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this paper, we propose a new algorithm that mines only maximal frequent subgraphs, i.e. subgraphs that are not a part of any other frequent subgraphs. This may exponentially decrease the size of the output set in the best case; in our experiments on practical data sets, mining maximal frequent subgraphs reduces the total number of mined patterns by two to three orders of magnitude.Our method first mines all frequent trees from a general graph database and then reconstructs all maximal subgraphs from the mined trees. Using two chemical structure benchmarks and a set of synthetic graph data sets, we demonstrate that, in addition to decreasing the output size, our algorithm can achieve a five-fold speed up over the current state-of-the-art subgraph mining algorithms.