A study on feature selection for toxicity prediction

  • Authors:
  • Gongde Guo;Daniel Neagu;Mark T. D. Cronin

  • Affiliations:
  • Department of Computing, University of Bradford, Bradford, UK;Department of Computing, University of Bradford, Bradford, UK;School of Pharmacy and Chemistry, Liverpool John Moores University, UK

  • Venue:
  • FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increasing amount and complexity of data used in predictive toxicology calls for efficient and effective feature selection methods in data pre-processing for data mining. In this paper, we propose a kNN model-based feature selection method (kNNMFS) aimed at overcoming the weaknesses of ReliefF method. It modifies the ReliefF method by: (1) using a kNN model as the starter selection aimed at choosing a set of more meaningful representatives to replace the original data for feature selection; (2) integration of the Heterogeneous Value Difference Metric to handle heterogeneous applications – those with both ordinal and nominal features; and (3) presenting a simple method of difference function calculation. The performance of kNNMFS was evaluated on a toxicity data set Phenols using a linear regression algorithm. Experimental results indicate that kNNMFS has a significant improvement in the classification accuracy for the trial data set.