COFI approach for mining frequent itemsets revisited

  • Authors:
  • Mohammad El-Hajj;Osmar R. Zaïane

  • Affiliations:
  • University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada

  • Venue:
  • Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The COFI approach for mining frequent itemsets, introduced recently, is an efficient algorithm that was demonstrated to outperform state-of-the-art algorithms on synthetic data. For instance, COFI is not only one order of magnitude faster and requires significantly less memory than the popular FP-Growth, it is also very effective with extremely large datasets, better than any reported algorithm. However, COFI has a significant drawback when mining dense transactional databases which is the case with some real datasets. The algorithm performs poorly in these cases because it ends up generating too many local candidates that are doomed to be infrequent. In this paper, we present a new algorithm COFI* for mining frequent itemsets. This novel algorithm uses the same data structure COFI-tree as its predecessor, but partitions the patterns in such a way to avoid the drawbacks of COFI. Moreover, its approach uses a pseudo-Oracle to pinpoint the maximal itemsets, from which all frequent itemsets are derived and counted, avoiding the generation of candidates fated infrequent. Our implementation tested on real and synthetic data shows that COFI* algorithm outperforms state-of-the-art algorithms, among them COFI itself.