COFI approach for mining frequent itemsets revisited

Authors:
Mohammad El-Hajj;Osmar R. Zaïane
Affiliations:
University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada
Venue:
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Year:
2004

Citing 10
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Inductive databases and condensed representations for data mining (extended abstract)

ILPS '97 Proceedings of the 1997 international symposium on Logic programming
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

DRFP-tree: disk-resident frequent pattern tree

Applied Intelligence
A persistent HY-Tree to efficiently support itemset mining on large datasets

Proceedings of the 2010 ACM Symposium on Applied Computing
Mining with constraints by pruning and avoiding ineffectual processing

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Relevance of counting in data mining tasks

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The COFI approach for mining frequent itemsets, introduced recently, is an efficient algorithm that was demonstrated to outperform state-of-the-art algorithms on synthetic data. For instance, COFI is not only one order of magnitude faster and requires significantly less memory than the popular FP-Growth, it is also very effective with extremely large datasets, better than any reported algorithm. However, COFI has a significant drawback when mining dense transactional databases which is the case with some real datasets. The algorithm performs poorly in these cases because it ends up generating too many local candidates that are doomed to be infrequent. In this paper, we present a new algorithm COFI* for mining frequent itemsets. This novel algorithm uses the same data structure COFI-tree as its predecessor, but partitions the patterns in such a way to avoid the drawbacks of COFI. Moreover, its approach uses a pseudo-Oracle to pinpoint the maximal itemsets, from which all frequent itemsets are derived and counted, avoiding the generation of candidates fated infrequent. Our implementation tested on real and synthetic data shows that COFI* algorithm outperforms state-of-the-art algorithms, among them COFI itself.