Efficient Mining of a Concise and Lossless Representation of High Utility Itemsets

  • Authors:
  • Cheng Wei Wu;Philippe Fournier-Viger;Philip S. Yu;Vincent S. Tseng

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining high utility item sets from transactional databases is an important data mining task, which refers to the discovery of item sets with high utilities (e.g. high profits). Although several studies have been carried out, current methods may present too many high utility item sets for users, which degrades the performance of the mining task in terms of execution and memory efficiency. To achieve high efficiency for the mining task and provide a concise mining result to users, we propose a novel framework in this paper for mining closed+ high utility item sets, which serves as a compact and loss less representation of high utility item sets. We present an efficient algorithm called CHUD (Closed+ High Utility item set Discovery) for mining closed+ high utility item sets. Further, a method called DAHU (Derive All High Utility item sets) is proposed to recover all high utility item sets from the set of closed+ high utility item sets without accessing the original database. Results of experiments on real and synthetic datasets show that CHUD and DAHU are very efficient with a massive reduction (up to 800 times in our experiments) in the number of high utility item sets. In addition, when all high utility item sets are recovered by DAHU, the approach combining CHUD and DAHU also outperforms the state-of-the-art algorithms in mining high utility item sets.