CT-ITL: efficient frequent item set mining using a compressed prefix tree with pattern growth

  • Authors:
  • Yudho Giri Sucahyo;Raj P. Gopalan

  • Affiliations:
  • School of Computing, Curtin University of Technology, Bentley, Western Australia;School of Computing, Curtin University of Technology, Bentley, Western Australia

  • Venue:
  • ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discovering association rules that identify relationships among sets of items is an important problem in data mining. Finding frequent item sets is computationally the most expensive step in association rule discovery and therefore it has attracted significant research attention. In this paper, we present a more efficient algorithm for mining complete sets of frequent item sets. In designing our algorithm, we have modified and synthesized a number of useful ideas that include prefix trees, pattern-growth, and tid-intersection. We extend the prefix-tree structure to store transaction groups and propose a new method to compress the tree. Transaction-id intersection is modified to include the count of transaction groups. We present performance comparisons of our algorithm against the fastest Apriori algorithm, Eclat and the latest extension of FP-Growth known as OpportuneProject. To study the trade-offs in compressing transactions in the prefix tree, we compare the performance of our algorithm with and without using the modified compressed prefix tree. We have tested all the algorithms using several widely used test datasets. The performance study shows that the new algorithm significantly reduces the processing time for mining frequent item sets from dense data sets that contain relatively long patterns. We discuss the performance results in detail and also the strengths and limitations of our algorithm.