Summary queries for frequent itemsets mining

  • Authors:
  • Shichao Zhang;Zhi Jin;Jingli Lu

  • Affiliations:
  • Department of Computer Science, Zhejiang Normal University, Jinhua, China and State Key Laboratory for Novel Software Technology, Nanjing University, China;Key Laboratory of High Confidence Software Technologies, School of EE & CS, Peking University, Beijing 100871, China;Institute of Information Sci. & Tech., Massey University, New Zealand

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are many advanced techniques that can efficiently mine frequent itemsets using a minimum-support. However, the question that remains unanswered is whether the minimum-support can really help decision makers to make decisions. In this paper, we study four summary queries for frequent itemsets mining, namely, (1) finding a support-average of itemsets, (2) finding a support-quantile of itemsets, (3) finding the number of itemsets that greater/less than the support-average, i.e., an approximated distribution of itemsets, and (4) finding the relative frequency of an itemset (compared its frequency with that of other itemsets in the same dataset). With these queries, a decision maker will know whether an itemset in question is greater/less than the support-quantile; the distribution of itemsets; and the frequentness of an itemset. Processing these summary queries is challenging, because the minimum-support constraint cannot be used to prune infrequent itemsets. In this paper, we propose several simple yet effective approximation solutions. We conduct extensive experiments for evaluating our strategy, and illustrate that the proposed approaches can well model and capture the statistical parameters (summary queries) of itemsets in a database.