extraRelief: improving relief by efficient selection of instances

Authors:
Manoranjan Dash;Ong Cher Yee
Affiliations:
School of Computer Engineering, Nanyang Technological University, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore
Venue:
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Year:
2007

Citing 7
Cited 0

A theory of the learnable

Communications of the ACM
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
A new two-phase sampling based algorithm for discovering association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Efficient data reduction with EASE

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a modified and improved RELIEF method, called EXTRARELIEF. RELIEF is a popular feature selection algorithm proposed by Kira and Rendell in 1992. Although compared to many other feature selection methods RELIEF or its extensions are found to be superior, in this paper we show that it can be further improved. In RELIEF, in the main loop, a number of instances are randomly selected using simple random sampling (SRS), and for each of these selected instances, the nearest hit and miss are determined, and these are used to assign ranks to the features. srs fails to represent the whole dataset properly when the sampling ratio is small (i.e., when the data is large), and/or when data is noisy. In EXTRARELIEF we use an efficient method to select instances. The proposed method is based on the idea that a sample has similar distribution to that of the whole. We approximate the data distribution by the frequencies of attribute-values. Experimental comparison with RELIEF shows that EXTRA RELIEF performs significantly better particularly for large and/or noisy domain.