Non-redundant subgroup discovery in large and complex data

Authors:
Matthijs van Leeuwen;Arno Knobbe
Affiliations:
Dept. of Information & Computing Sciences, Universiteit Utrecht, The Netherlands;LIACS, Universiteit Leiden, The Netherlands
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Year:
2011

Citing 14
Cited 4

Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)

The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Closed Sets for Labeled Data

The Journal of Machine Learning Research
Tight Optimistic Estimates for Fast Subgroup Discovery

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Exceptional Model Mining

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
The Chosen Few: On Identifying Valuable Patterns

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Beam search induction and similarity constraints for predictive clustering trees

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Maximal exceptions with minimal descriptions

Data Mining and Knowledge Discovery
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Pattern teams

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

An enhanced relevance criterion for more concise supervised pattern discovery

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Generic pattern trees for exhaustive exceptional model mining

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Multi-label lego -- enhancing multi-label classifiers with local patterns

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
One click mining: interactive local pattern discovery through implicit preference and performance learning

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large and complex data is challenging for most existing discovery algorithms, for several reasons. First of all, such data leads to enormous hypothesis spaces, making exhaustive search infeasible. Second, many variants of essentially the same pattern exist, due to (numeric) attributes of high cardinality, correlated attributes, and so on. This causes top-k mining algorithms to return highly redundant result sets, while ignoring many potentially interesting results. These problems are particularly apparent with Subgroup Discovery and its generalisation, Exceptional Model Mining. To address this, we introduce subgroup set mining: one should not consider individual subgroups, but sets of subgroups. We consider three degrees of redundancy, and propose corresponding heuristic selection strategies in order to eliminate redundancy. By incorporating these strategies in a beam search, the balance between exploration and exploitation is improved. Experiments clearly show that the proposed methods result in much more diverse subgroup sets than traditional Subgroup Discovery methods.