TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

Authors:
Jianyong Wang;Jiawei Han;Ying Lu;Petre Tzvetkov
Affiliations:
-;IEEE;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 20
Cited 46

Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Online association rule mining

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Detecting change in categorical data: mining contrast sets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining frequent patterns with counting inference

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining confident rules without support requirement

Proceedings of the tenth international conference on Information and knowledge management
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Finding Interesting Associations without Support Pruning

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Using transposition for pattern discovery from microarray data

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
CoMine: Efficient Mining of Correlated Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing, storing and querying frequent patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Efficient mining of weighted interesting patterns with a strong weight and/or support affinity

Information Sciences: an International Journal
Mining top-k frequent patterns in the presence of the memory constraint

The VLDB Journal — The International Journal on Very Large Data Bases
Power-law relationship and self-similarity in the itemset support distribution: analysis and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Mining top-k Hot Melody Structures over online music query streams

Pattern Recognition Letters
Efficient Discovery of Top-K Minimal Jumping Emerging Patterns

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
A sliding window method for finding top-k path traversal patterns over streaming Web click-sequences

Expert Systems with Applications: An International Journal
A framework for mining top-k frequent closed itemsets using order preserving generators

Proceedings of the 2nd Bangalore Annual Compute Conference
Interactive mining of top-K frequent closed itemsets from data streams

Expert Systems with Applications: An International Journal
Mining top-k maximal reference sequences from streaming web click-sequences with a damped sliding window

Expert Systems with Applications: An International Journal
On pushing weight constraints deeply into frequent itemset mining

Intelligent Data Analysis
Finding N-Most Prevalent Colocated Event Sets

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Approximating the number of frequent sets in dense data

Knowledge and Information Systems
Solving a committee formation and scheduling problem by frequent itemset mining

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Efficient incremental mining of top-K frequent closed itemsets

DS'07 Proceedings of the 10th international conference on Discovery science
Mining top-k frequent closed itemsets over data streams using the sliding window model

Expert Systems with Applications: An International Journal
Mining top-K frequent itemsets through progressive sampling

Data Mining and Knowledge Discovery
TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

Knowledge and Information Systems
Fun at a department store: data mining meets switching theory

FUN'10 Proceedings of the 5th international conference on Fun with algorithms
On the relation between jumping emerging patterns and rough set theory with application to data classification

Transactions on rough sets XII
Using ontologies to facilitate post-processing of association rules by domain experts

Information Sciences: an International Journal
Fast extraction of locally optimal patterns based on consistent pattern function variations

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
An efficient algorithm for mining erasable itemsets

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
TGP: mining top-K frequent closed graph pattern without minimum support

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Mining top-k regular-frequent itemsets using database partitioning and support estimation

Expert Systems with Applications: An International Journal
Fast mining erasable itemsets using NC_sets

Expert Systems with Applications: An International Journal
Mining top-k frequent closed itemsets is not in APX

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
On exploring the power-law relationship in the itemset support distribution

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
ExMiner: an efficient algorithm for mining top-k frequent patterns

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Mining spatial colocation patterns: a different framework

Data Mining and Knowledge Discovery
Discovering better navigation sequences for the session construction problem

Data & Knowledge Engineering
Mining the k-most interesting frequent patterns sequentially

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Mining top-k sequential rules

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Finding minimum representative pattern sets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining top-K high utility itemsets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Harnessing the wisdom of the crowds for accurate web page clipping

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Mop: An Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing

Fundamenta Informaticae
Mining top-k association rules

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Incorporating occupancy into frequent pattern mining for high quality pattern recommendation

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Interrelation analysis of celestial spectra data using constrained frequent pattern trees

Knowledge-Based Systems
ShrFP-tree: an efficient tree structure for mining share-frequent patterns

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Fast mining Top-Rank-k frequent patterns by using Node-lists

Expert Systems with Applications: An International Journal
Mining maximal frequent patterns by considering weight conditions over data streams

Knowledge-Based Systems
Visualizing big network traffic data using frequent pattern mining and hypergraphs

Computing
Discovering diverse-frequent patterns in transactional databases

Proceedings of the 17th International Conference on Management of Data
Efficient mining of maximal correlated weight frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min_support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.