Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection

Authors:
José-Ramón Cano;Francisco Herrera;Manuel Lozano;Salvador García
Affiliations:
Department of Computer Science, University of Jaén, 23700 Linares, Jaén, Spain;Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain;Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain;Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain
Venue:
Expert Systems with Applications: An International Journal
Year:
2008

Citing 33
Cited 3

Editing for the k-nearest neighbors rule by a genetic algorithm

Pattern Recognition Letters - Special issue on genetic algorithms
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Instance Selection and Construction for Data Mining

Instance Selection and Construction for Data Mining
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
On Issues of Instance Selection

Data Mining and Knowledge Discovery
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
The CN2 Induction Algorithm

Machine Learning
Rule Induction with CN2: Some Recent Improvements

EWSL '91 Proceedings of the European Working Session on Machine Learning
Descriptive Induction through Subgroup Discovery: A Case Study in a Medical Domain

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Generating Actionable Knowledge by Expert-Guided Subgroup Discovery

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Adapting classification rule induction to subgroup discovery

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned

Machine Learning
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Stratification for scaling up evolutionary prototype selection

Pattern Recognition Letters
Propositionalization-based relational subgroup discovery with RSD

Machine Learning
Evaluating the performance of cost-based discretization versus entropy-and error-based discretization

Computers and Operations Research
Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability

Data & Knowledge Engineering
A novel feature selection algorithm for text categorization

Expert Systems with Applications: An International Journal
An integrated approach for operational knowledge acquisition of refuse incinerators

Expert Systems with Applications: An International Journal
Medical decision support system based on artificial immune recognition immune system (AIRS), fuzzy weighted pre-processing and feature selection

Expert Systems with Applications: An International Journal
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Expert-guided subgroup discovery: methodology and application

Journal of Artificial Intelligence Research
An efficient data mining approach for discovering interesting knowledge from customer transactions

Expert Systems with Applications: An International Journal
SD-map: a fast algorithm for exhaustive subgroup discovery

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Subgroup discovery techniques and applications

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Multiobjective evolutionary induction of subgroup discovery fuzzy rules: a case study in marketing

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation

Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection

Information Sciences: an International Journal
Predictive-collaborative model as recovery and validation tool. Case of study: Psychiatric emergency department decision support

Expert Systems with Applications: An International Journal
Searching for rules to detect defective modules: A subgroup discovery approach

Information Sciences: an International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

The subgroup discovery, domain of application of CN2-SD, is defined as: ''given a population of individuals and a property of those individuals, we are interested in finding a population of subgroups as large as possible and have the most unusual statistical characteristic with respect to the property of interest''. The subgroup discovery algorithm CN2-SD, based on a separate and conquer strategy, has to face the scaling problem which appears in the evaluation of large size data sets. To avoid this problem, in this paper we propose the use of instance selection algorithms for scaling down the data sets before the subgroup discovery task. The results show that CN2-SD can be executed on large data set sizes pre-processed, maintaining and improving the quality of the subgroups discovered.