An optimization of ReliefF for classification in large datasets

Authors:
Yue Huang;Paul J. McCullagh;Norman D. Black
Affiliations:
Division of Epidemiology and Public Health, School of Community Health Sciences, University of Nottingham, Nottingham, UK;Faculty of Computing and Engineering, University of Ulster, Newtownabbey, Northern Ireland, UK;Research and Innovation, University of Ulster, Newtownabbey, Northern Ireland, UK
Venue:
Data & Knowledge Engineering
Year:
2009

Citing 25
Cited 3

Instance-Based Learning Algorithms

Machine Learning
A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Multidimensional access methods

ACM Computing Surveys (CSUR)
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Information-theoretic algorithm for feature selection

Pattern Recognition Letters
A Monotonic Measure for Optimal Feature Selection

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An adaptation of Relief for attribute estimation in regression

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Attribute Dependencies, Understandability and Split Selection in Tree Based Models

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Error-Based Pruning of Decision Trees Grown on Very Large Data Sets Can Work!

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Restructuring decision tables for elucidation of knowledge

Data & Knowledge Engineering
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Consistency-based search in feature selection

Artificial Intelligence
A selective sampling approach to active feature selection

Artificial Intelligence
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Data mining in bioinformatics using Weka

Bioinformatics
Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability

Data & Knowledge Engineering
Towards efficient variables ordering for Bayesian networks classifier

Data & Knowledge Engineering
A Niching Memetic Algorithm for Simultaneous Clustering and Feature Selection

IEEE Transactions on Knowledge and Data Engineering

A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm

Knowledge-Based Systems
Large-margin feature selection for monotonic classification

Knowledge-Based Systems
HyDR-MI: A hybrid algorithm to reduce dimensionality in multiple instance learning

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

ReliefF has proved to be a successful feature selector but when handling a large dataset, it is computationally expensive. We present an optimization using Supervised Model Construction which improves starter selection. Effectiveness has been evaluated using 12 UCI datasets and a clinical diabetes database. Experiments indicate that compared with ReliefF, the proposed method improved computation efficiency whilst maintaining the classification accuracy. In the clinical dataset (20,000 records with 47 features), feature selection via Supervised Model Construction (FSSMC) reduced the processing time by 80%, compared to ReliefF, and maintained accuracy for Naive Bayes, IB1 and C4.5 classifiers.