Theory of dependence values

  • Authors:
  • Rosa Meo

  • Affiliations:
  • Univ. degli Studi di Torino, Torino, Italy

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new model to evaluate dependencies in data mining problems is presented and discussed. The well-known concept of the association rule is replaced by the new definition of dependence value, which is a single real number uniquely associated with a given itemset. Knowledge of dependence values is sufficient to describe all the dependencies characterizing a given data mining problem. The dependence value of an itemset is the difference between the occurrence probability of the itemset and a corresponding “maximum independence estimate.” This can be determined as a function of joint probabilities of the subsets of the itemset being considered by maximizing a suitable entropy function. So it is possible to separate in an itemset of cardinaltiy k the dependence inherited from its subsets of cardinality (k − 1) and the specific inherent dependence of that itemset. The absolute value of the difference between the probability p(i) of the event i that indicates the prescence of the itemset {a,b,... } and its maximum independence estimate is constant for any combination of values of Q &angl0; a,b,... &angr0; Q. In1paddition, the Boolean function specifying the combination of values for which the dependence is positive is a parity function. So the determination of such combinations is immediate. The model appears to be simple and powerful.