An Efficient Algorithm for Discovering Frequent Subgraphs

Authors:
Michihiro Kuramochi;George Karypis
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2004

Citing 29
Cited 48

CLIP: concept learning from inference patterns

Artificial Intelligence - Special issue: AI research in Japan
Graphical Templates for Model Registration

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scientific knowledge discovery using inductive logic programming

Communications of the ACM
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes

Data Mining and Knowledge Discovery
Similarity Searching in Medical Image Databases

IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Graph-Based Data Mining

IEEE Intelligent Systems
Learning Logical Definitions from Relations

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
SEuS: Structure Extraction Using Summaries

DS '02 Proceedings of the 5th International Conference on Discovery Science
Carcinogenesis Predictions Using ILP

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
Mining Association Rules in Multiple Relations

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding faces in cluttered scenes using random labeled graph matching

ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Data Organization and Access for Efficient Data Mining

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
The predictive toxicology evaluation challenge

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1

Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Frequent Substructure-Based Approaches for Classifying Chemical Compounds

IEEE Transactions on Knowledge and Data Engineering
A formal foundation for workflow composition, workflow view definition, and workflow normalization based on petri nets

APCCM '05 Proceedings of the 2nd Asia-Pacific conference on Conceptual modelling - Volume 43
Qualitative comparison of graph-based and logic-based multi-relational data mining: a case study

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
Comparison of graph-based and logic-based multi-relational data mining

ACM SIGKDD Explorations Newsletter
Subdue: compression-based frequent pattern discovery in graph data

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Discovering Frequent Graph Patterns Using Disjoint Paths

IEEE Transactions on Knowledge and Data Engineering
Searching for high-support itemsets in itemset trees

Intelligent Data Analysis
Discovering frequent geometric subgraphs

Information Systems
Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms

IEEE Transactions on Knowledge and Data Engineering
RAP: a conceptual business intelligence framework

COMPUTE '08 Proceedings of the 1st Bangalore Annual Compute Conference
Tree model guided candidate generation for mining frequent subtrees from XML documents

ACM Transactions on Knowledge Discovery from Data (TKDD)
An integrated, generic approach to pattern mining: data mining template library

Data Mining and Knowledge Discovery
Frequent pattern-growth approach for document organization

Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
Association Analysis Techniques for Bioinformatics Problems

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Pruning Strategies Based on the Upper Bound of Information Gain for Discriminative Subgraph Mining

Knowledge Acquisition: Approaches, Algorithms and Applications
Exploiting knowledge ontology and software agents for PPI network analysis

Expert Systems with Applications: An International Journal
Mining Tree-Based Frequent Patterns from XML

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Pattern discovery from graph-structured data: a data mining perspective

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Fast categorization of web documents represented by graphs

WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Predicting protein function by frequent functional association pattern mining in protein interaction networks

IEEE Transactions on Information Technology in Biomedicine
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent subgraphs to extract communication patterns in data-centres

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Efficient discovery of frequent subgraph patterns in uncertain graph databases

Proceedings of the 14th International Conference on Extending Database Technology
Semantically-guided clustering of text documents via frequent subgraphs discovery

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Multiple hypothesis testing in pattern discovery

DS'11 Proceedings of the 14th international conference on Discovery science
Re-mining item associations: Methodology and a case study in apparel retailing

Decision Support Systems
Patterns of influence in a recommendation network

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Using bags of symbols for automatic indexing of graphical document image databases

GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives
Efficient method to perform isomorphism testing of labeled graphs

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Cl-GBI: a novel approach for extracting typical patterns from graph-structured data

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Indices of novelty for emerging topic detection

Information Processing and Management: an International Journal
An algorithm for network motif discovery in biological networks

International Journal of Data Mining and Bioinformatics
Extracting discriminative patterns from graph structured data using constrained search

PKAW'06 Proceedings of the 9th Pacific Rim Knowledge Acquisition international conference on Advances in Knowledge Acquisition and Management
Regular path queries on large graphs

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Review of bisonet abstraction techniques

Bisociative Knowledge Discovery
Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

Fundamenta Informaticae
MultiAspectForensics: mining large heterogeneous networks using tensor

International Journal of Web Engineering and Technology
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

The VLDB Journal — The International Journal on Very Large Data Bases
NODAR: mining globally distributed substructures from a single labeled graph

Journal of Intelligent Information Systems
A direct mining approach to efficient constrained graph pattern discovery

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Insight into Disrupted Spatial Patterns of Human Connectome in Alzheimer's Disease via Subgraph Mining

International Journal of Knowledge Discovery in Bioinformatics
The complexity of mining maximal frequent subgraphs

Proceedings of the 32nd symposium on Principles of database systems
A multiobjective evolutionary programming framework for graph-based data mining

Information Sciences: an International Journal
Mining frequent neighborhood patterns in a large labeled graph

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approaches cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the data sets in these domains. An alternate way of modeling the objects in these data sets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper, we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph data sets. We experimentally evaluate the performance of FSG using a variety of real and synthetic data sets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in data sets containing more than 200,000 graph transactions and scales linearly with respect to the size of the data set.