Mining Statistical Information of Frequent Fault-Tolerant Patterns in Transactional Databases

  • Authors:
  • Ardian Kristanto Poernomo;Vivekanand Gopalkrishnan

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Constraints applied on classic frequent patterns are too strict and may cause interesting patterns to be missed. Hence, researchers have proposed to mine a more relaxed version of frequent patterns, where transactions are allowed to miss some items in the itemset they support. Patterns exhibiting such "faults" are called frequent fault-tolerant patterns (FFT-patterns) if they are significant in number. In this paper, the term "pattern" is distinguished from "itemset" as referring to a pair (tidset 脳 itemset). Unlike classical frequent patterns, the number of FFTpatterns grows exponentially not only with the number of items, but also with the number of transactions. Since the latter may reach millions, mining FFT-patterns by enumerating them becomes infeasible. Hence, the challenge is to represent FFT-patterns concisely without losing any useful information. To address this, we draw on the observation that, in transactional databases, the transactions themselves are not important from the data mining point-ofview; i.e. researchers are interested in finding itemsets contained in lots of transactions, rather than in the transactions per se. Therefore, we propose to mine only the frequent itemsets along with the statistical information of the supporting transaction sets, rather than enumerate entire FFTpatterns. Then we present our approach the BIAS framework, consisting of Backtracking algorithm, Integer Linear Programming (ILP) constraints, and aggregation statistics to solve this problem. Algorithms under this framework not only increase the efficiency of the FFT-patterns mining process by more than an order of magnitude, but also provide a more comprehensive analysis of FFT-Patterns.