Mining top-K high utility itemsets

  • Authors:
  • Cheng Wei Wu;Bai-En Shie;Vincent S. Tseng;Philip S. Yu

  • Affiliations:
  • Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, ROC, Tainan, Taiwan Roc;Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, ROC, Tainan, Taiwan Roc;Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, ROC, Tainan, Taiwan Roc;Department of Computer Science, University of Illinois at Chicago, Chicago, Illinois, USA, Chicago, Illinois, USA

  • Venue:
  • Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining high utility itemsets from databases is an emerging topic in data mining, which refers to the discovery of itemsets with utilities higher than a user-specified minimum utility threshold min_util. Although several studies have been carried out on this topic, setting an appropriate minimum utility threshold is a difficult problem for users. If min_util is set too low, too many high utility itemsets will be generated, which may cause the mining algorithms to become inefficient or even run out of memory. On the other hand, if min_util is set too high, no high utility itemset will be found. Setting appropriate minimum utility thresholds by trial and error is a tedious process for users. In this paper, we address this problem by proposing a new framework named top-k high utility itemset mining, where k is the desired number of high utility itemsets to be mined. An efficient algorithm named TKU (Top-K Utility itemsets mining) is proposed for mining such itemsets without setting min_util. Several features were designed in TKU to solve the new challenges raised in this problem, like the absence of anti-monotone property and the requirement of lossless results. Moreover, TKU incorporates several novel strategies for pruning the search space to achieve high efficiency. Results on real and synthetic datasets show that TKU has excellent performance and scalability.