Mining itemset utilities from transaction databases

  • Authors:
  • Hong Yao;Howard J. Hamilton

  • Affiliations:
  • Department of Computer Science, Unitversity of Regina, Regina, SK, Canada;Department of Computer Science, Unitversity of Regina, Regina, SK, Canada

  • Venue:
  • Data & Knowledge Engineering - Special issue: ER 2003
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

The rationale behind mining frequent itemsets is that only itemsets with high frequency are of interest to users. However, the practical usefulness of frequent itemsets is limited by the significance of the discovered itemsets. A frequent item-set only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items. In this paper, we propose a utility based itemset mining approach to overcome this limitation. The proposed approach permits users to quantify their preferences concerning the usefulness of itemsets using utility values. The usefulness of an itemset is characterized as a utility constraint. That is, an itemset is interesting to the user only if it satisfies a given utility constraint. We show that the pruning strategies used in previous itemset mining approaches cannot be applied to utility constraints. In response, we identify several mathematical properties of utility constraints. Then, two novel pruning strategies are designed. Two algorithms for utility based itemset mining are developed by incorporating these pruning strategies. The algorithms are evaluated by applying them to synthetic and real world databases. Experimental results show that the proposed algorithms are effective on the databases tested.