Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A new two-phase sampling based algorithm for discovering association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Efficient Progressive Sampling for Association Rules
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns
IEICE - Transactions on Information and Systems
Interestingness of Association Rules Using Symmetrical Tau and Logistic Regression
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Summary queries for frequent itemsets mining
Journal of Systems and Software
Mining top-K frequent itemsets through progressive sampling
Data Mining and Knowledge Discovery
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce
Proceedings of the 21st ACM international conference on Information and knowledge management
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Hi-index | 0.00 |
As discovering association rules in a very large database is time consuming, researchers have developed many algorithms to improve the efficiency Sampling can significantly reduce the cost of mining, since the mining algorithms need to deal with only a small dataset compared to the original database Especially, if data comes as a stream flowing at a faster rate than can be processed, sampling seems to be the only choice How to sample the data and how big the sample size should be for a given error bound and confidence level are key issues for particular data mining tasks In this paper, we derive the sufficient sample size based on central limit theorem for sampling large datasets with replacement This approach requires smaller sample size than that based on the Chernoff bounds and is effective for association rules mining The effectiveness of the method has been evaluated on both dense and sparse datasets.