CT-ITL: efficient frequent item set mining using a compressed prefix tree with pattern growth

Authors:
Yudho Giri Sucahyo;Raj P. Gopalan
Affiliations:
School of Computing, Curtin University of Technology, Bentley, Western Australia;School of Computing, Curtin University of Technology, Bentley, Western Australia
Venue:
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Year:
2003

Citing 10
Cited 6

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Mining Frequent Item Sets with Convertible Constraints

Proceedings of the 17th International Conference on Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
TreeITL-Mine: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection, and Prefix Tree

AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Mining frequent item sets by opportunistic projection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Efficient mining of long frequent patterns from very large dense datasets

Design and application of hybrid intelligent systems
Association mining

ACM Computing Surveys (CSUR)
ON DATA STRUCTURES FOR ASSOCIATION RULE DISCOVERY

Applied Artificial Intelligence
Memory-efficient frequent-itemset mining

Proceedings of the 14th International Conference on Extending Database Technology
An integrated approach for mining meta-rules

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Profile association rule mining using tests of hypotheses without support threshold

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering association rules that identify relationships among sets of items is an important problem in data mining. Finding frequent item sets is computationally the most expensive step in association rule discovery and therefore it has attracted significant research attention. In this paper, we present a more efficient algorithm for mining complete sets of frequent item sets. In designing our algorithm, we have modified and synthesized a number of useful ideas that include prefix trees, pattern-growth, and tid-intersection. We extend the prefix-tree structure to store transaction groups and propose a new method to compress the tree. Transaction-id intersection is modified to include the count of transaction groups. We present performance comparisons of our algorithm against the fastest Apriori algorithm, Eclat and the latest extension of FP-Growth known as OpportuneProject. To study the trade-offs in compressing transactions in the prefix tree, we compare the performance of our algorithm with and without using the modified compressed prefix tree. We have tested all the algorithms using several widely used test datasets. The performance study shows that the new algorithm significantly reduces the processing time for mining frequent item sets from dense data sets that contain relatively long patterns. We discuss the performance results in detail and also the strengths and limitations of our algorithm.