Selecting feature subset for high dimensional data via the propositional FOIL rules

Authors:
Guangtao Wang;Qinbao Song;Baowen Xu;Yuming Zhou
Affiliations:
Department of Computer Science & Technology, Xi'an Jiaotong University, 710049, China;Department of Computer Science & Technology, Xi'an Jiaotong University, 710049, China;Department of Computer Science & Technology, Nanjing University, 210093, China;Department of Computer Science & Technology, Nanjing University, 210093, China
Venue:
Pattern Recognition
Year:
2013

Citing 29
Cited 1

Instance-Based Learning Algorithms

Machine Learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
C4.5: Programs for Machine Learning

C4.5: Programs for Machine Learning
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Feature Selection Using Rough Sets Theory

ECML '93 Proceedings of the European Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Redundancy based feature selection for microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Testing the significance of attribute interactions

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Extended Relief Algorithms in Instance-Based Feature Filtering

ALPIT '07 Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)
Searching for interacting features in subset selection

Intelligent Data Analysis
A filter model for feature subset selection based on genetic algorithm

Knowledge-Based Systems
Feature subset selection in large dimensionality domains

Pattern Recognition
Feature Selection Algorithm Based on Association Rules Mining Method

ICIS '09 Proceedings of the 2009 Eigth IEEE/ACIS International Conference on Computer and Information Science
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Mining of Attribute Interactions Using Information Theoretic Metrics

ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
The feature selection problem: traditional methods and a new algorithm

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Using mutual information for selecting features in supervised neural net learning

IEEE Transactions on Neural Networks
A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data

IEEE Transactions on Knowledge and Data Engineering

Sentiment visualization and classification via semi-supervised nonlinear dimensionality reduction

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Feature interaction is an important issue in feature subset selection. However, most of the existing algorithms only focus on dealing with irrelevant and redundant features. In this paper, a propositional FOIL rule based algorithm FRFS, which not only retains relevant features and excludes irrelevant and redundant ones but also considers feature interaction, is proposed for selecting feature subset for high dimensional data. FRFS first merges the features appeared in the antecedents of all FOIL rules, achieving a candidate feature subset which excludes redundant features and reserves interactive ones. Then, it identifies and removes irrelevant features by evaluating features in the candidate feature subset with a new metric CoverRatio, and obtains the final feature subset. The efficiency and effectiveness of FRFS are extensively tested upon both synthetic and real world data sets, and it is compared with other six representative feature subset selection algorithms, including CFS, FCBF, Consistency, Relief-F, INTERACT, and the rule-based FSBAR, in terms of the number of selected features, runtime and the classification accuracies of the four well-known classifiers including Naive Bayes, C4.5, PART and IB1 before and after feature selection. The results on the five synthetic data sets show that FRFS can effectively identify irrelevant and redundant features while reserving interactive ones. The results on the 35 real world high dimensional data sets demonstrate that compared with other six feature selection algorithms, FRFS cannot only efficiently reduce the feature space, but also can significantly improve the performance of the four well-known classifiers.