Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules
Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A new two-phase sampling based algorithm for discovering association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Progressive Sampling for Association Rules
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Evaluation of Sampling for Data Mining of Association Rules
Evaluation of Sampling for Data Mining of Association Rules
Efficient data reduction with EASE
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
An associative classifier based on positive and negative rules
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Efficient mining method for retrieving sequential patterns over online data streams
Journal of Information Science
ACM SIGMOD Record
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Open source data mining: workshop report
ACM SIGKDD Explorations Newsletter
ACM Computing Surveys (CSUR)
Automated support specification for efficient mining of interesting association rules
Journal of Information Science
Market basket analysis in a multiple store environment
Decision Support Systems
On biased reservoir sampling in the presence of stream evolution
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient Frequent Itemsets Mining by Sampling
Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Summary queries for frequent itemsets mining
Journal of Systems and Software
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Hi-index | 0.00 |
Association Rule Mining (ARM) is one of the data mining techniques used to extract hidden knowledge from datasets, that can be used by an organization's decision makers to improve overall profit. However, performing ARM requires repeated passes over the entire database. Obviously, for large database, the role of input/output overhead in scanning the database is very significant. A popular solution to improve the speed of ARM is to apply the mining algorithm on a sample instead of the entire database. In this paper, a parameterized sampling algorithm for ARM is presented. This algorithm extracts sample datasets based on three parameters: transaction frequency, transaction length and transaction frequency-length. To evaluate its performance and accuracy, a comparison against a two-phase sampling-based algorithm is performed using real and synthetic datasets. The experimental results show that the proposed sampling algorithm in some cases outperforms two-phase sampling algorithm, and achieves up to 98% accuracy.