Efficient association rule mining among infrequent items

  • Authors:
  • Junfeng Ding;Stephen Yau

  • Affiliations:
  • University of Illinois at Chicago;University of Illinois at Chicago

  • Venue:
  • Efficient association rule mining among infrequent items
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

An efficient way is developed to find the valid association rules among the infrequent items, which is seldom mentioned by other researchers. A new disk-based data structure, called T&barbelow;ransactional C&barbelow;o-O&barbelow;ccurrence M&barbelow;atrix, in short TCOM, is designed to combine the advantages of both transactional oriented (horizontal) layout and item oriented (vertical) layout of the database. So any itemsets could be randomly accessed and counted without full scan of the original database or the TCOM, which significantly increases the efficiency of the algorithms. Then two similar compressed matrix structures that reside in the memory are constructed during the mining process based on TCOM for different applications. Both structures only contain the infrequent items and incorporate a forest-like structure. One of them is called R&barbelow;educed T&barbelow;ransactional C&barbelow;o-O&barbelow;ccurrence M&barbelow;atrix, in short RTCOM and is suitable for the applications such as mining large databases or on the machines with relative small memory space. By changing the status of the RTCOM, with a little more memory space required, the infrequent patterns and the valid association rules among infrequent items can be mined out. Another compressed structure is a variant of RTCOM which is called S&barbelow;imple T&barbelow;ransactional C&barbelow;o-O&barbelow;ccurrence M&barbelow;atrix, in short SiTCOM. The codes that we develop on this structure generally consume more memory space but definitely are more efficient. So SiTcom is suitable for the machines with large memory space. Both RTCOM and SiTcom, with a little change of the algorithms, are also suitable for solving frequency association rule mining problem.