Knowledge-Based sampling for subgroup discovery

  • Authors:
  • Martin Scholz

  • Affiliations:
  • Artificial Intelligence Group, Department of Computer Science, University of Dortmund, Germany

  • Venue:
  • LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Subgroup discovery aims at finding interesting subsets of a classified example set that deviates from the overall distribution. The search is guided by a so-called utility function, trading the size of subsets (coverage) against their statistical unusualness. By choosing the utility function accordingly, subgroup discovery is well suited to find interesting rules with much smaller coverage and bias than possible with standard classifier induction algorithms. Smaller subsets can be considered local patterns, but this work uses yet another definition: According to this definition global patterns consist of all patterns reflecting the prior knowledge available to a learner, including all previously found patterns. All further unexpected regularities in the data are referred to as local patterns. To address local pattern mining in this scenario, an extension of subgroup discovery by the knowledge-based sampling approach to iterative model refinement is presented. It is a general, cheap way of incorporating prior probabilistic knowledge in arbitrary form into Data Mining algorithms addressing supervised learning tasks.