Data mining tasks and methods: Subgroup discovery: deviation analysis

Authors:
Willi Klösgen
Affiliations:
Principal Researcher, Fraunhofer Institute for Autonomous Intelligent Systems, Sankt Augustin, Germany
Venue:
Handbook of data mining and knowledge discovery
Year:
2002

Citing 5
Cited 4

Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Machine Learning

Machine Learning
Bump hunting in high-dimensional data

Statistics and Computing
An Information Theoretic Approach to Rule Induction from Databases

IEEE Transactions on Knowledge and Data Engineering
Learning from Inconsistent and Noisy Data: The AQ18 Approach

ISMIS '99 Proceedings of the 11th International Symposium on Foundations of Intelligent Systems

Measuring interestingness of discovered skewed patterns in data cubes

Decision Support Systems
Evaluation Measures for Multi-class Subgroup Discovery

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Towards a general framework for data mining

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
First-Order Multi-class Subgroup Discovery

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

The two general data analytic questions of subgroup mining (see Chapter 5.2 ot this handbook) deal with deviations and associations (see Chapters 16.2.3 and 16.2.4). A deviation pattern describes a deviating behavior (distribution) of a target variable in a subgroup. Target variable and behavior type are selected by the analyst for an individual mining task, the deviating subgroups are determined by the mining method. Deviation patterns rely on statistical tests and thus capture knowledge about a subgroup in the form of a verified alternative hypothesis on the distribution of a target variable. Typically the rejected null hypothesis assumes an uninteresting, not deviating subgroup. Search for deviating subgroups is organized in two phases. In a first brute force search, different search heuristics can be applied to find a set of deviating subgroups. In a second refinement phase, redundancy elimination operators construct the best system of subgroups from the brute force search results. We discuss the role of tests for subgroup mining, introduce specializations of the general deviation pattern, summarize search and automatic refinement algorithms, and deal with navigation and visualization operations that support an analyst when interactively constructing the best system of deviating subgroups.