Knowledge-Based sampling for subgroup discovery

Authors:
Martin Scholz
Affiliations:
Artificial Intelligence Group, Department of Computer Science, University of Dortmund, Germany
Venue:
LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
Year:
2004

Citing 18
Cited 5

The Strength of Weak Learnability

Machine Learning
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Introduction to Monte Carlo methods

Proceedings of the NATO Advanced Study Institute on Learning in graphical models
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
A sequential sampling algorithm for a general class of utility criteria

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
Random Forests

Machine Learning
What Makes Patterns Interesting in Knowledge Discovery Systems

IEEE Transactions on Knowledge and Data Engineering
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Pattern Detection and Discovery

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Rule Evaluation Measures: A Unifying View

ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
RSD: relational subgroup discovery through first-order feature construction

ILP'02 Proceedings of the 12th international conference on Inductive logic programming
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Sampling-based sequential subgroup mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Boosting classifiers for drifting concepts

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Interactivity Closes the GapLessons Learned in an Automotive Industry Application

Proceedings of the 2010 conference on Data Mining for Business Applications
Tracking concept change with incremental boosting by minimization of the evolving exponential loss

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
An ensemble method for incremental classification in stationary and non-stationary environments

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subgroup discovery aims at finding interesting subsets of a classified example set that deviates from the overall distribution. The search is guided by a so-called utility function, trading the size of subsets (coverage) against their statistical unusualness. By choosing the utility function accordingly, subgroup discovery is well suited to find interesting rules with much smaller coverage and bias than possible with standard classifier induction algorithms. Smaller subsets can be considered local patterns, but this work uses yet another definition: According to this definition global patterns consist of all patterns reflecting the prior knowledge available to a learner, including all previously found patterns. All further unexpected regularities in the data are referred to as local patterns. To address local pattern mining in this scenario, an extension of subgroup discovery by the knowledge-based sampling approach to iterative model refinement is presented. It is a general, cheap way of incorporating prior probabilistic knowledge in arbitrary form into Data Mining algorithms addressing supervised learning tasks.