A note on approximation measures for multi-valued dependencies in relational databases

  • Authors:
  • Chris Giannella;Edward Robertson

  • Affiliations:
  • Computer Science Department, Indiana University, Bloomington, IN;Computer Science Department, Indiana University, Bloomington, IN

  • Venue:
  • Information Processing Letters
  • Year:
  • 2003

Quantified Score

Hi-index 0.89

Visualization

Abstract

We consider the problem of defining a normalized approximation measure for multi-valued dependencies in relational database theory. An approximation measure is a function mapping relation instances to real numbers. The number to which an instance is mapped, intuitively, describes the strength of the dependency in that instance. A normalized approximation measure for functional dependencies has been proposed previously: the minimum number of tuples that need be removed for the functional dependency to hold divided by the total number of tuples. This leads naturally to a normalized measure for multivalued dependencies: the minimum number of tuples that need be removed for the multi-valued dependency to hold divided by the total number of tuples.The measure for functional dependencies can be computed efficiently, O(|r|log(|r|)) where |r| is the relation instance. However, we show that an efficient algorithm for computing the analogous measure for multi-valued dependencies is not likely to exist. A polynomial time algorithm for computing the measure would lead to a polynomial time algorithm for an NP-complete problem (proven by a reduction from the maximum edge biclique problem in graph theory). Hence, we argue that it is not a good measure. We propose an alternate measure based on the lossless join characterization of multi-valued dependencies. This measure is efficiently computable, O(|r|2).