GENCCS: a correlated group difference approach to contrast set mining

  • Authors:
  • Mondelle Simeon;Robert Hilderman

  • Affiliations:
  • Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada;Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada

  • Venue:
  • MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Contrast set mining has developed as a data mining task which aims at discerning differences amongst groups. These groups can be patients, organizations, molecules, and even time-lines, and are defined by a selected property that distinguishes one from the other. A contrast set is a conjunction of attribute-value pairs that differ significantly in their distribution across groups. The search for contrast sets can be prohibitively expensive on relatively large datasets because every combination of attribute-values must be examined, causing a potential exponential growth of the search space. In this paper, we introduce the notion of a correlated group difference (CGD) and propose a contrast set mining technique that utilizes mutual information and all confidence to select the attribute-value pairs that are most highly correlated, in order to mine CGDs. Our experiments on real datasets demonstrate the efficiency of our approach and the interestingness of the CGDs discovered.