Subgroup discover in large size data sets preprocessed using stratified instance selection for increasing the presence of minority classes

Authors:
José-Ramón Cano;Salvador García;Francisco Herrera
Affiliations:
Department of Computer Science, University of Jaén, 23700 Linares, Jaén, Spain;Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain;Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain
Venue:
Pattern Recognition Letters
Year:
2008

Citing 23
Cited 2

Instance-Based Learning Algorithms

Machine Learning
Editing for the k-nearest neighbors rule by a genetic algorithm

Pattern Recognition Letters - Special issue on genetic algorithms
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Instance Selection and Construction for Data Mining

Instance Selection and Construction for Data Mining
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Data Mining and Knowledge Discovery
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Adapting classification rule induction to subgroup discovery

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
KDD'99 competition: knowledge discovery contest

ACM SIGKDD Explorations Newsletter
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned

Machine Learning
Stratification for scaling up evolutionary prototype selection

Pattern Recognition Letters
Partition based pattern synthesis technique with efficient algorithms for nearest neighbor classification

Pattern Recognition Letters
Hybrid flexible neural-tree-based intrusion detection systems: Research Articles

International Journal of Intelligent Systems
A hierarchical SOM-based intrusion detection system

Engineering Applications of Artificial Intelligence
Introducing a very large dataset of handwritten Farsi digits and a study on their varieties

Pattern Recognition Letters
SD-map: a fast algorithm for exhaustive subgroup discovery

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Cluster-Based sampling approaches to imbalanced data distributions

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation
An Automatically Tuning Intrusion Detection System

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule

Pattern Recognition
NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	0.10

Visualization

Abstract

The subgroup discovery is defined as: ''given a population of individuals and a property of those individuals, we are interested in finding a population of subgroups as large as possible and in having the most unusual statistical characteristic with respect to the property of interest''. The subgroup discovery algorithms have to face the scaling up problem which appears in the evaluation of large size data sets. In this paper we are interested in the extraction of subgroups from large size data sets. To avoid the scaling up problem, we propose the combination of stratification and instance selection algorithms for scaling down the data set before the subgroup discovery task. In addition, two new stratification models are proposed to increase the presence of minority classes in data sets, which affects to the subgroup discovery process on them. The results show that the subgroup discovery extraction can be executed on large data sets preprocessed independently of the presence of minority classes, which could not be executed in other way.