Out-of-core frequent pattern mining on a commodity PC

Authors:
Gregory Buehrer;Srinivasan Parthasarathy;Amol Ghoting
Affiliations:
Ohio State University, Columbus, OH;Ohio State University, Columbus, OH;Ohio State University, Columbus, OH
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 16
Cited 10

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Managing energy and server resources in hosting centers

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Scalable Techniques for Mining Causal Structures

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Frequent Itemsets from Secondary Memory

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Improving database performance on simultaneous multithreading processors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases

Toward terabyte pattern mining: an architecture-conscious solution

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems (TODS)
An integrated, generic approach to pattern mining: data mining template library

Data Mining and Knowledge Discovery
Efficient algorithms for mining constrained frequent patterns from uncertain data

Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Mining tree-structured data on multicore systems

Proceedings of the VLDB Endowment
A persistent HY-Tree to efficiently support itemset mining on large datasets

Proceedings of the 2010 ACM Symposium on Applied Computing
Efficient algorithms for the mining of constrained frequent patterns from uncertain data

ACM SIGKDD Explorations Newsletter
Memory-efficient frequent-itemset mining

Proceedings of the 14th International Conference on Extending Database Technology
Stream mining of frequent sets with limited memory

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Mining frequent itemsets from sparse data streams in limited memory environments

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets,up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining.