Effective sampling for mining association rules

Authors:
Yanrong Li;Raj P. Gopalan
Affiliations:
Department of Computing, Curtin University of Technology, Bentley, Western Australia;Department of Computing, Curtin University of Technology, Bentley, Western Australia
Venue:
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Year:
2004

Citing 7
Cited 6

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A new two-phase sampling based algorithm for discovering association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Efficient Progressive Sampling for Association Rules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns

IEICE - Transactions on Information and Systems
Interestingness of Association Rules Using Symmetrical Tau and Logistic Regression

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Summary queries for frequent itemsets mining

Journal of Systems and Software
Mining top-K frequent itemsets through progressive sampling

Data Mining and Knowledge Discovery
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

As discovering association rules in a very large database is time consuming, researchers have developed many algorithms to improve the efficiency Sampling can significantly reduce the cost of mining, since the mining algorithms need to deal with only a small dataset compared to the original database Especially, if data comes as a stream flowing at a faster rate than can be processed, sampling seems to be the only choice How to sample the data and how big the sample size should be for a given error bound and confidence level are key issues for particular data mining tasks In this paper, we derive the sufficient sample size based on central limit theorem for sampling large datasets with replacement This approach requires smaller sample size than that based on the Chernoff bounds and is effective for association rules mining The effectiveness of the method has been evaluated on both dense and sparse datasets.