An Efficient Hash-Based Method for Discovering the Maximal Frequent Set

Authors:
Don-Lin Yang;Ching-Ting Pan;Yeh-Ching Chung
Affiliations:
-;-;-
Venue:
COMPSAC '01 Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development
Year:
2001

Citing 0
Cited 2

A Theoretical Framework and an Implementation Architecture for Self Adaptive Web Sites

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Application of particle swarm optimization to association rule mining

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The association rule mining can be divided into two steps. The first step is to find out all frequent itemsets, whose occurrences are greater than or equal to the user-specified threshold. The second step is to generate reliable association rules based on all frequent itemsets found in the first step. Identifying all frequent itemsets in a large database dominates the overall performance in the association rule mining. In this paper, we propose an efficient hash-based method, HMFS, for discovering the maximal frequent itemsets. The HMFS method combines the advantages of both the DHP (Direct Hashing and Pruning) and the Pincer-Search algorithms. The combination leads to two advantages. First, the HMFS method, in general, can reduce the number of database scans. Second, the HMFS can filter the infrequent candidate itemsets and can use the filtered itemsets to find the maximal frequent itemsets. These two advantages can reduce the overall computing time of finding the maximal frequent itemsets. In addition, the HMFS method also provides an efficient mechanism to construct the maximal frequent candidate itemsets to reduce the search space. We have implemented the HMFS method along with the DHP and the Pincer-Search algorithms on a Pentium III 800 MHz PC. The experimental results show that the HMFS method has better performance than the DHP and the Pincer-Search algorithms for most of test cases. In particular, our method has significant improvement over the DHP and the Pincer-Search algorithms when the size of a database is large and the length of the longest itemset is relatively long.