Self-adjusting binary search trees
Journal of the ACM (JACM)
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Using association rules for product assortment decisions: a case study
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The implementation and performance of compressed databases
ACM SIGMOD Record
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
CT-ITL: efficient frequent item set mining using a compressed prefix tree with pattern growth
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
New Algorithms for Fast Discovery of Association Rules
New Algorithms for Fast Discovery of Association Rules
Advances in frequent itemset mining implementations: report on FIMI'03
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Cache-conscious frequent pattern mining on a modern processor
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Out-of-core frequent pattern mining on a commodity PC
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Toward terabyte pattern mining: an architecture-conscious solution
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of frequent itemset mining on multiple-core processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pfp: parallel fp-growth for query recommendation
Proceedings of the 2008 ACM conference on Recommender systems
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Data Mining in Bioinformatics
Efficient colossal pattern mining in high dimensional datasets
Knowledge-Based Systems
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Parallel frequent itemset mining using systolic arrays
Knowledge-Based Systems
Energy-efficient in-memory database computing
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Efficient discovery of frequent itemsets in large datasets is a key component of many data mining tasks. In-core algorithms---which operate entirely in main memory and avoid expensive disk accesses---and in particular the prefix tree-based algorithm FP-growth are generally among the most efficient of the available algorithms. Unfortunately, their excessive memory requirements render them inapplicable for large datasets with many distinct items and/or itemsets of high cardinality. To overcome this limitation, we propose two novel data structures---the CFP-tree and the CFP-array---, which reduce memory consumption by about an order of magnitude. This allows us to process significantly larger datasets in main memory than previously possible. Our data structures are based on structural modifications of the prefix tree that increase compressability, an optimized physical representation, lightweight compression techniques, and intelligent node ordering and indexing. Experiments with both real-world and synthetic datasets show the effectiveness of our approach.