Finding Maximal Fully-Correlated Itemsets in Large Databases

  • Authors:
  • Lian Duan;William Nick Street

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. Much previous research focuses on finding correlated pairs instead of correlated itemsets in which all items are correlated with each other. When designing gift sets, store shelf arrangements, or website product categories, we are more interested in correlated itemsets than correlated pairs. We solve this problem by finding maximal fully-correlated itemsets (MFCIs), in which all subsets are closely related to all other subsets. Putting the items in an MFCI together can promote sales within this itemset. Though some exsiting methods find high-correlation itemsets, they suffer from both efficiency and effectiveness problems in large datasets. In this paper, we explore high-dimensional correlation in two ways. First, we expand the set of desirable properties for correlation measures and study the advantages and disadvantages of various measures. Second, we propose an MFCI framework to decouple the correlation measure from the need for efficient search. By wrapping the best measure in our MFCI framework, we take advantage of likelihood ratio’s superiority in evaluating itemsets, make use of the properties of MFCI to eliminate itemsets with irrelevant items, and still achieve good computational performance.