Sequence mining in categorical domains: incorporating constraints
Proceedings of the ninth international conference on Information and knowledge management
Emerging scientific applications in data mining
Communications of the ACM - Evolving data mining into solutions for insights
Using finite state automata for sequence mining
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Mining sequential patterns with constraints in large databases
Proceedings of the eleventh international conference on Information and knowledge management
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules
Data Mining and Knowledge Discovery
Mining Sequential Patterns with Regular Expression Constraints
IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Sequential PAttern mining using a bitmap representation
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Advances in frequent itemset mining implementations: report on FIMI'03
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
ACM SIGKDD Explorations Newsletter
Graph mining: Laws, generators, and algorithms
ACM Computing Surveys (CSUR)
Mining evolving data streams for frequent patterns
Pattern Recognition
Efficiently Mining Frequent Embedded Unordered Trees
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Discovering Significant Patterns
Machine Learning
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm
IEEE Transactions on Knowledge and Data Engineering
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Finding reliable subgraphs from large probabilistic graphs
Data Mining and Knowledge Discovery
Sampling for Sequential Pattern Mining: From Static Databases to Data Streams
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
An integrated, generic approach to pattern mining: data mining template library
Data Mining and Knowledge Discovery
Mining frequent cross-graph quasi-cliques
ACM Transactions on Knowledge Discovery from Data (TKDD)
Discovering Patterns in Flows: A Privacy Preserving Approach with the ACSM Prototype
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Frequent subgraph mining on a single large graph using sampling techniques
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Learning theory analysis for association rules and sequential event prediction
The Journal of Machine Learning Research
Hi-index | 0.10 |
During the past few years, the problem of assessing the statistical significance of frequent patterns extracted from a given set S of data has received much attention. Considering that S always consists of a sample drawn from an unknown underlying distribution, two types of risks can arise during a frequent pattern mining process: accepting a false frequent pattern or rejecting a true one. In this context, many approaches presented in the literature assume that the dataset size is an application-dependent parameter. In this case, there is a trade-off between both errors leading to solutions that only control one risk to the detriment of the other one. On the other hand, many sampling-based methods have attempted to determine the optimal size of S ensuring a good approximation of the original (potentially infinite) database from which S is drawn. However, these approaches often resort to Chernoff bounds that do not allow the independent control of the two risks. In this paper, we overcome the mentioned drawbacks by providing a lower bound on the sample size required to control both risks and achieve a significant frequent pattern mining task.