An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques
Data mining: concepts and techniques
The "DGX" distribution for mining massive, skewed data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Real world performance of association rule algorithms
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A Tight Upper Bound on the Number of Candidate Patterns
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Feasible itemset distributions in data mining: theory and application
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Mining Frequent Itemsets without Support Threshold: With and without Item Constraints
IEEE Transactions on Knowledge and Data Engineering
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets
IEEE Transactions on Knowledge and Data Engineering
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Hi-index | 0.01 |
We identify and explore in this paper an important phenomenon which points out that the power-law relationship appears in the distribution of itemset supports. Characterizing such a relationship will benefit many applications such as providing the direction of tuning the performance of the frequent-itemset mining. Nevertheless, due to the explosive number of itemsets, it will be prohibitively expensive to retrieve characteristics of the power-law relationship in the distribution of itemset supports. As such, we also propose in this paper a valid and cost-effective algorithm, called algorithm PPL, to extract characteristics of the distribution without the need of discovering all itemsets in advance. Experimental results demonstrate that algorithm PPL is able to efficiently extract the characteristics of the power-law relationship with high accuracy.