CFP-tree: A compact disk-based structure for storing and querying frequent itemsets

Authors:
Guimei Liu;Hongjun Lu;Jeffrey Xu Yu
Affiliations:
The Hong Kong University of Science and Technology, Hong Kong;The Hong Kong University of Science and Technology, Hong Kong;The Chinese University of Hong Kong, Hong Kong
Venue:
Information Systems
Year:
2007

Citing 28
Cited 8

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of constrained frequent set queries with 2-variable constraints

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using a knowledge cache for interactive discovery of association rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient algorithm to update large itemsets with early pruning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Can we push more constraints into frequent pattern mining?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A condensed representation to find frequent patterns

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
Online Generation of Association Rules

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Mining Frequent Item Sets with Convertible Constraints

Proceedings of the 17th International Conference on Data Engineering
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Concise Representation of Frequent Patterns Based on Generalized Disjunction-Free Generators

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
DualMiner: a dual-pruning algorithm for itemsets with constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Querying multiple sets of discovered rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On Computing Condensed Frequent Pattern Bases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Speed-up Iterative Frequent Itemset Mining with Constraint Changes

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Constrained Correlated Sets

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A performance study of four index structures for set-valued attributes of low cardinality

The VLDB Journal — The International Journal on Very Large Data Bases
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing, storing and querying frequent patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient dynamic mining of constrained frequent sets

ACM Transactions on Database Systems (TODS)

Efficient single-pass frequent pattern mining using a prefix-tree

Information Sciences: an International Journal
FIUT: A new method for mining frequent itemsets

Information Sciences: an International Journal
An improved association rules mining method

Expert Systems with Applications: An International Journal
Controlling false positives in association rule mining

Proceedings of the VLDB Endowment
Finding minimum representative pattern sets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
AssocExplorer: an association rule visualization system for exploratory data analysis

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
ShrFP-tree: an efficient tree structure for mining share-frequent patterns

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
A performance study of three disk-based structures for indexing and querying frequent itemsets

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent itemset mining is an important problem in the data mining area with a wide range of applications. Many decision support systems need to support online interactive frequent itemset mining, which is a challenging task because frequent itemset mining is a computation intensive repetitive process. One solution is to precompute frequent itemsets. In this paper, we propose a compact disk-based data structure-CFP-tree to store precomputed frequent itemsets on a disk to support online mining requests. The CFP-tree structure effectively utilizes the redundancy in frequent itemsets to save space. The compressing ratio of a CFP-tree can be as high as several thousands or even higher. Efficient algorithms for retrieving frequent itemsets from a CFP-tree, as well as efficient algorithms to construct and maintain a CFP-tree, are developed. Our performance study demonstrates that with a CFP-tree, frequent itemset mining requests can be responded to promptly.