Data Mining in Large Databases Using Domain Generalization Graphs

  • Authors:
  • Robert J. Hilderman;Howard J. Hamilton;Nick Cercone

  • Affiliations:
  • Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2. hilder@cs.uregina.ca;Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2. hamilton@cs.uregina.ca;Department of Computer Science, Faculty of Mathematics, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1. ncercone@math.uwaterloo.ca

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Attribute-oriented generalization summarizes theinformation in a relational database by repeatedly replacingspecific attribute values with more general concepts accordingto user-defined concept hierarchies. We introduce domaingeneralization graphs for controlling the generalization of aset of attributes and show how they are constructed. We thenpresent serial and parallel versions of the Multi-AttributeGeneralization algorithm for traversing the generalization statespace described by joining the domain generalization graphs formultiple attributes. Based upon a generate-and-test approach,the algorithm generates all possible summaries consistent withthe domain generalization graphs. Our experimental results showthat significant speedups are possible by partitioning pathcombinations from the DGGs across multiple processors. We alsorank the interestingness of the resulting summaries usingmeasures based upon variance and relative entropy. Ourexperimental results also show that these measures provide aneffective basis for analyzing summary data generated fromrelational databases. Variance appears more useful because ittends to rank the less complex summaries (i.e., those with fewattributes and/or tuples) as more interesting.