High utility pattern mining using the maximal itemset property and lexicographic tree structures

  • Authors:
  • Ming-Yen Lin;Tzer-Fu Tu;Sue-Chen Hsueh

  • Affiliations:
  • Dept. of Information Engineering and Computer Science, Feng Chia University, 100, Wenhua Road, Xitun, Taichung 407, Taiwan, ROC;Dept. of Information Engineering and Computer Science, Feng Chia University, 100, Wenhua Road, Xitun, Taichung 407, Taiwan, ROC;Dept. of Information Management, Chaoyang University of Technology, 168, Gifeng E. Road, Wufeng, Taichung 413, Taiwan, ROC

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

The problem of high utility mining is discovering all of the high utility itemsets in a transactional database. Most algorithms find high utility itemsets in two steps. The first step identifies all of the potential itemsets. The second step then determines the high utility itemsets from the set of potential itemsets. The large number of potential itemsets in the first step is generally the mining bottleneck. If we can reduce the number of potential itemsets, the mining performance can be improved significantly. In this paper, we use a maximal itemset property and propose an algorithm called UMMI (high Utility Mining using the Maximal Itemset property) to significantly reduce the number of potential itemsets in the first step. In the second step, UMMI uses an effective lexicographic tree structure to determine all of the high utility itemsets. In general, UMMI outperforms all three of the previously used algorithms, including CTU-PRO, an optimized TWU-mining algorithm, and Two-Phase, in our experiments using synthetic datasets. On average, UMMI is 5, 3, and 7 times faster than CTU-PRO, TWU-mining, and Two-Phase, respectively. In a real data experiment, UMMI is 6 times faster than Two-Phase. The other two algorithms are not capable of completing the mining step in a reasonable amount of time. UMMI uses an approximately fixed amount of memory, which is generally less than the other algorithms for each mining. The experimental results show that the proposed algorithm can mine the high utility itemsets efficiently. In addition, UMMI is linearly scalable with respect to the number of transactions.