An enhanced relevance criterion for more concise supervised pattern discovery

Authors:
Henrik Großkreutz;Daniel Paurat;Stefan Rüping
Affiliations:
Fraunhofer IAIS, Sankt Augustin, Germany;Fraunhofer IAIS and University of Bonn, Sankt Augustin, Germany;Fraunhofer IAIS, Sankt Augustin, Germany
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 20
Cited 0

Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Efficient mining of association rules using closed itemset lattices

Information Systems
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Adapting classification rule induction to subgroup discovery

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Closed Sets for Labeled Data

The Journal of Machine Learning Research
Tight Optimistic Estimates for Fast Subgroup Discovery

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
The Chosen Few: On Identifying Valuable Patterns

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Correlated itemset mining in ROC space: a constraint programming approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining

The Journal of Machine Learning Research
Non-redundant Subgroup Discovery Using a Closure System

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data (TKDD)
Tell me what i need to know: succinctly summarizing data with itemsets

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Non-redundant subgroup discovery in large and complex data

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Pattern teams

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A survey on condensed representations for frequent sets

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Relevancy in constraint-based subgroup discovery

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised local pattern discovery aims to find subsets of a database with a high statistical unusualness in the distribution of a target attribute. Local pattern discovery is often used to generate a human-understandable representation of the most interesting dependencies in a data set. Hence, the more crisp and concise the output is, the better. Unfortunately, standard algorithm often produce very large and redundant outputs. In this paper, we introduce delta-relevance, a definition of a more strict criterion of relevance. It will allow us to significantly reduce the output space, while being able to guarantee that every local pattern has a delta-relevant representative which is almost as good in a clearly defined sense. We show empirically that delta-relevance leads to a considerable reduction of the amount of returned patterns. We also demonstrate that in a top-k setting, the removal of not delta-relevant patterns improves the quality of the result set.