Non-redundant subgroup discovery in large and complex data

  • Authors:
  • Matthijs van Leeuwen;Arno Knobbe

  • Affiliations:
  • Dept. of Information & Computing Sciences, Universiteit Utrecht, The Netherlands;LIACS, Universiteit Leiden, The Netherlands

  • Venue:
  • ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large and complex data is challenging for most existing discovery algorithms, for several reasons. First of all, such data leads to enormous hypothesis spaces, making exhaustive search infeasible. Second, many variants of essentially the same pattern exist, due to (numeric) attributes of high cardinality, correlated attributes, and so on. This causes top-k mining algorithms to return highly redundant result sets, while ignoring many potentially interesting results. These problems are particularly apparent with Subgroup Discovery and its generalisation, Exceptional Model Mining. To address this, we introduce subgroup set mining: one should not consider individual subgroups, but sets of subgroups. We consider three degrees of redundancy, and propose corresponding heuristic selection strategies in order to eliminate redundancy. By incorporating these strategies in a beam search, the balance between exploration and exploitation is improved. Experiments clearly show that the proposed methods result in much more diverse subgroup sets than traditional Subgroup Discovery methods.