Frequent subgraph mining on a single large graph using sampling techniques

Authors:
Ruoyu Zou;Lawrence B. Holder
Affiliations:
Washington State University, Pullman, WA;Washington State University, Pullman, WA
Venue:
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Year:
2010

Citing 30
Cited 0

The power of sampling in knowledge discovery

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules

Data Mining and Knowledge Discovery
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
Scalable mining of large disk-based graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
MotifMiner: Efficient discovery of common substructures in biochemical molecules

Knowledge and Information Systems
GraphMiner: a structural pattern-mining system for large disk-based graph databases and its applications

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs

Bioinformatics
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
Sampling from large graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding reliable subgraphs from large probabilistic graphs

Data Mining and Knowledge Discovery
Mining Large Networks with Subgraph Counting

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
A lower bound on the sample size needed to perform a significant frequent pattern mining task

Pattern Recognition Letters
Mining globally distributed frequent subgraphs in a single labeled graph

Data & Knowledge Engineering
The Gaston Tool for Frequent Subgraph Mining

Electronic Notes in Theoretical Computer Science (ENTCS)

Quantified Score

Hi-index	0.02

Visualization

Abstract

Frequent subgraph mining has always been an important issue in data mining. Several frequent graph mining methods have been developed for mining graph transactions. However, these methods become less usable when the dataset is a single large graph. Also, when the graph is too large to fit in main memory, alternative techniques are necessary to efficiently find frequent subgraphs. We investigate the task of frequent subgraph mining on a single large graph using sampling approaches and find that sampling is a feasible approach for this task. We evaluate different sampling methods and provide a novel sampling method called 'random areas selection sampling', which produces better results than all the current graph sampling approaches with customized parameters.