Memory-aware frequent k-itemset mining

Authors:
Maurizio Atzori;Paolo Mancarella;Franco Turini
Affiliations:
Dipartimento di Informatica, University of Pisa, Italy;Dipartimento di Informatica, University of Pisa, Italy;Dipartimento di Informatica, University of Pisa, Italy
Venue:
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Year:
2005

Citing 12
Cited 0

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A new two-phase sampling based algorithm for discovering association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
FAST: A New Sampling-Based Algorithm for Discovering Association Rules

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
ExAMiner: Optimized Level-wise Frequent Pattern Mining with Monotone Constraints

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Memory issues in frequent itemset mining

Proceedings of the 2004 ACM symposium on Applied computing
Keeping things simple: finding frequent item sets by recursive elimination

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we show that the well known problem of computing frequent k-itemsets (i.e. itemsets of cardinality k) in a given dataset can be reduced to the problem of finding iceberg queries from a stream of queries suitably constructed from the original dataset. Hence, algorithms for computing frequent k-itemsets can be obtained by adapting algorithms for computing iceberg queries. In the paper we show that, for sparse datasets, this can be done directly, i.e. without generating frequent x-itemsets, for each x k, as done in the most common algorithms based on a level-wise approach. We exploit a recent algorithm for finding iceberg queries and define an algorithm which requires only three sequential passes over the dataset to compute the frequent k-itemsets (even for k 3). An important feature of the algorithm is that the amount of main memory required can be determined in advance, and it is shown to be very low for sparse datasets. Experiments show that for very large datasets with millions of small transactions our proposal outperforms the state-of-the-art algorithms. Furthermore, we sketch a first extension of our algorithm that works over data streams.