The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Molecular feature mining in HIV data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules
Data Mining and Knowledge Discovery
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ANF: a fast and scalable tool for data mining in massive graphs
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Mining Molecular Fragments: Finding Relevant Substructures of Molecules
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
State of the art of graph-based data mining
ACM SIGKDD Explorations Newsletter
Scalable mining of large disk-based graph databases
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
MotifMiner: Efficient discovery of common substructures in biochemical molecules
Knowledge and Information Systems
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Finding Frequent Patterns in a Large Sparse Graph*
Data Mining and Knowledge Discovery
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding reliable subgraphs from large probabilistic graphs
Data Mining and Knowledge Discovery
Mining Large Networks with Subgraph Counting
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
A lower bound on the sample size needed to perform a significant frequent pattern mining task
Pattern Recognition Letters
Mining globally distributed frequent subgraphs in a single labeled graph
Data & Knowledge Engineering
The Gaston Tool for Frequent Subgraph Mining
Electronic Notes in Theoretical Computer Science (ENTCS)
Hi-index | 0.02 |
Frequent subgraph mining has always been an important issue in data mining. Several frequent graph mining methods have been developed for mining graph transactions. However, these methods become less usable when the dataset is a single large graph. Also, when the graph is too large to fit in main memory, alternative techniques are necessary to efficiently find frequent subgraphs. We investigate the task of frequent subgraph mining on a single large graph using sampling approaches and find that sampling is a feasible approach for this task. We evaluate different sampling methods and provide a novel sampling method called 'random areas selection sampling', which produces better results than all the current graph sampling approaches with customized parameters.