Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Managing energy and server resources in hosting centers
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Parallel data mining for association rules on shared memory systems
Knowledge and Information Systems
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Scalable Techniques for Mining Causal Structures
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Frequent Itemsets from Secondary Memory
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Improving database performance on simultaneous multithreading processors
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Cache-conscious frequent pattern mining on a modern processor
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Toward terabyte pattern mining: an architecture-conscious solution
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Out-of-core coherent closed quasi-clique mining from large dense graph databases
ACM Transactions on Database Systems (TODS)
An integrated, generic approach to pattern mining: data mining template library
Data Mining and Knowledge Discovery
Efficient algorithms for mining constrained frequent patterns from uncertain data
Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Mining tree-structured data on multicore systems
Proceedings of the VLDB Endowment
A persistent HY-Tree to efficiently support itemset mining on large datasets
Proceedings of the 2010 ACM Symposium on Applied Computing
Efficient algorithms for the mining of constrained frequent patterns from uncertain data
ACM SIGKDD Explorations Newsletter
Memory-efficient frequent-itemset mining
Proceedings of the 14th International Conference on Extending Database Technology
Stream mining of frequent sets with limited memory
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Mining frequent itemsets from sparse data streams in limited memory environments
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets,up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining.