On exploring the power-law relationship in the itemset support distribution

Authors:
Kun-Ta Chuang;Jiun-Long Huang;Ming-Syan Chen
Affiliations:
Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan, ROC;Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, ROC;Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan, ROC
Venue:
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Year:
2006

Citing 12
Cited 0

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
The "DGX" distribution for mining massive, skewed data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A Tight Upper Bound on the Number of Candidate Patterns

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Feasible itemset distributions in data mining: theory and application

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Mining Frequent Itemsets without Support Threshold: With and without Item Constraints

IEEE Transactions on Knowledge and Data Engineering
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

IEEE Transactions on Knowledge and Data Engineering
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Quantified Score

Hi-index	0.01

Visualization

Abstract

We identify and explore in this paper an important phenomenon which points out that the power-law relationship appears in the distribution of itemset supports. Characterizing such a relationship will benefit many applications such as providing the direction of tuning the performance of the frequent-itemset mining. Nevertheless, due to the explosive number of itemsets, it will be prohibitively expensive to retrieve characteristics of the power-law relationship in the distribution of itemset supports. As such, we also propose in this paper a valid and cost-effective algorithm, called algorithm PPL, to extract characteristics of the distribution without the need of discovering all itemsets in advance. Experimental results demonstrate that algorithm PPL is able to efficiently extract the characteristics of the power-law relationship with high accuracy.