Core-generating approximate minimum entropy discretization for rough set feature selection in pattern classification

Authors:
David Tian;Xiao-jun Zeng;John Keane
Affiliations:
Department of Computing, Faculty of ACES, Sheffield Hallam University, Howard Street, Sheffield S1 1WB, UK and School of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL ...;School of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL, UK;School of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL, UK
Venue:
International Journal of Approximate Reasoning
Year:
2011

Citing 22
Cited 6

Data mining methods for knowledge discovery

Data mining methods for knowledge discovery
Data mining: concepts and techniques

Data mining: concepts and techniques
Principles of data mining

Principles of data mining
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Discrete Mathematical Structures

Discrete Mathematical Structures
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Rough Sets: Mathematical Foundations

Rough Sets: Mathematical Foundations
Rough set methods in feature selection and recognition

Pattern Recognition Letters - Special issue: Rough sets, pattern recognition and data mining
Dynamic Discretization of Continuous Attributes

IBERAMIA '98 Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence
An introduction to variable and feature selection

The Journal of Machine Learning Research
Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature selection based on rough sets and particle swarm optimization

Pattern Recognition Letters
Approximations and reducts with covering generalized rough sets

Computers & Mathematics with Applications
Mutual information-based feature selection and partition design in fuzzy rule-based classifiers from vague data

International Journal of Approximate Reasoning
A Rough Set Based Hybrid Method to Feature Selection

KAM '08 Proceedings of the 2008 International Symposium on Knowledge Acquisition and Modeling
Attribute dependency functions considering data efficiency

International Journal of Approximate Reasoning
Approximation algorithms for combinatorial problems

Journal of Computer and System Sciences
Gaussian kernel based fuzzy rough sets: Model, uncertainty measures and applications

International Journal of Approximate Reasoning
Reduction about approximation spaces of covering generalized rough sets

International Journal of Approximate Reasoning
Feature selection for Bayesian network classifiers using the MDL-FS score

International Journal of Approximate Reasoning
Aggregating multiple classification results using fuzzy integration and stochastic feature selection

International Journal of Approximate Reasoning

Classification systems based on rough sets under the belief function framework

International Journal of Approximate Reasoning
An efficient rough feature selection algorithm with a multi-granulation view

International Journal of Approximate Reasoning
FRPS: A Fuzzy Rough Prototype Selection method

Pattern Recognition
Incorporating logistic regression to decision-theoretic rough sets for classifications

International Journal of Approximate Reasoning
Feature selection with test cost constraint

International Journal of Approximate Reasoning
Feature subset selection using improved binary gravitational search algorithm

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst retaining important ones that preserve the classification power of the original dataset. Reducts are feature subsets selected by RSFS. Core is the intersection of all the reducts of a dataset. RSFS can only handle discrete attributes, hence, continuous attributes need to be discretized before being input to RSFS. Discretization determines the core size of a discrete dataset. However, current discretization methods do not consider the core size during discretization. Earlier work has proposed core-generating approximate minimum entropy discretization (C-GAME) algorithm which selects the maximum number of minimum entropy cuts capable of generating a non-empty core within a discrete dataset. The contributions of this paper are as follows: (1) the C-GAME algorithm is improved by adding a new type of constraint to eliminate the possibility that only a single reduct is present in a C-GAME-discrete dataset; (2) performance evaluation of C-GAME in comparison to C4.5, multi-layer perceptrons, RBF networks and k-nearest neighbours classifiers on ten datasets chosen from the UCI Machine Learning Repository; (3) performance evaluation of C-GAME in comparison to Recursive Minimum Entropy Partition (RMEP), Chimerge, Boolean Reasoning and Equal Frequency discretization algorithms on the ten datasets; (4) evaluation of the effects of C-GAME and the other four discretization methods on the sizes of reducts; (5) an upper bound is defined on the total number of reducts within a dataset; (6) the effects of different discretization algorithms on the total number of reducts are analysed; (7) performance analysis of two RSFS algorithms (a genetic algorithm and Johnson's algorithm).