Mining top-k frequent patterns in the presence of the memory constraint

Authors:
Kun-Ta Chuang;Jiun-Long Huang;Ming-Syan Chen
Affiliations:
Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC;Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, ROC;Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2008

Citing 19
Cited 10

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Efficient Data Mining for Path Traversal Patterns

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Adaptive and Resource-Aware Mining of Frequent Sets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Memory issues in frequent itemset mining

Proceedings of the 2004 ACM symposium on Applied computing
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Frequent Itemsets without Support Threshold: With and without Item Constraints

IEEE Transactions on Knowledge and Data Engineering
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

IEEE Transactions on Knowledge and Data Engineering
Tight upper bounds on the number of candidate patterns

ACM Transactions on Database Systems (TODS)
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Robust and distributed top-n frequent-pattern mining with SAP BW accelerator

Proceedings of the VLDB Endowment
Mining top-k sequential rules

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Mining top-K high utility itemsets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Aggregate licenses validation for digital rights violation detection

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) - Special Issue on Multimedia Security
Mining top-k association rules

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Self-configuring data mining for ubiquitous computing

Information Sciences: an International Journal
Computation time efficient approach for licenses validation in DRM systems

Multimedia Tools and Applications
Sliding window based weighted maximal frequent pattern mining over data streams

Expert Systems with Applications: An International Journal
Mining maximal frequent patterns by considering weight conditions over data streams

Knowledge-Based Systems
High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore in this paper a practicably interesting mining task to retrieve top-k (closed) itemsets in the presence of the memory constraint. Specifically, as opposed to most previous works that concentrate on improving the mining efficiency or on reducing the memory size by best effort, we first attempt to specify the available upper memory size that can be utilized by mining frequent itemsets. To comply with the upper bound of the memory consumption, two efficient algorithms, called MTK and MTK_Close, are devised for mining frequent itemsets and closed itemsets, respectively, without specifying the subtle minimum support. Instead, users only need to give a more human-understandable parameter, namely the desired number of frequent (closed) itemsets k. In practice, it is quite challenging to constrain the memory consumption while also efficiently retrieving top-k itemsets. To effectively achieve this, MTK and MTK_Close are devised as level-wise search algorithms, where the number of candidates being generated-and-tested in each database scan will be limited. A novel search approach, called 驴-stair search, is utilized in MTK and MTK_Close to effectively assign the available memory for testing candidate itemsets with various itemset-lengths, which leads to a small number of required database scans. As demonstrated in the empirical study on real data and synthetic data, instead of only providing the flexibility of striking a compromise between the execution efficiency and the memory consumption, MTK and MTK_Close can both achieve high efficiency and have a constrained memory bound, showing the prominent advantage to be practical algorithms of mining frequent patterns.