Data mining tasks and methods: Subgroup discovery: deviation analysis

  • Authors:
  • Willi Klösgen

  • Affiliations:
  • Principal Researcher, Fraunhofer Institute for Autonomous Intelligent Systems, Sankt Augustin, Germany

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The two general data analytic questions of subgroup mining (see Chapter 5.2 ot this handbook) deal with deviations and associations (see Chapters 16.2.3 and 16.2.4). A deviation pattern describes a deviating behavior (distribution) of a target variable in a subgroup. Target variable and behavior type are selected by the analyst for an individual mining task, the deviating subgroups are determined by the mining method. Deviation patterns rely on statistical tests and thus capture knowledge about a subgroup in the form of a verified alternative hypothesis on the distribution of a target variable. Typically the rejected null hypothesis assumes an uninteresting, not deviating subgroup. Search for deviating subgroups is organized in two phases. In a first brute force search, different search heuristics can be applied to find a set of deviating subgroups. In a second refinement phase, redundancy elimination operators construct the best system of subgroups from the brute force search results. We discuss the role of tests for subgroup mining, introduce specializations of the general deviation pattern, summarize search and automatic refinement algorithms, and deal with navigation and visualization operations that support an analyst when interactively constructing the best system of deviating subgroups.