Improved binary PSO for feature selection using gene expression data

  • Authors:
  • Li-Yeh Chuang;Hsueh-Wei Chang;Chung-Jui Tu;Cheng-Hong Yang

  • Affiliations:
  • Department of Chemical Engineering, I-Shou University, Kaohsiung 840, Taiwan;Department of Biomedical Science and Environmental Biology, and Graduate Institute of Natural Products, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, 807, Taiwan;Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung 807, Taiwan;Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung 807, Taiwan

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. Compared to the number of genes involved, available training data sets generally have a fairly small sample size in cancer type classification. These training data limitations constitute a challenge to certain classification methodologies. A reliable selection method for genes relevant for sample classification is needed in order to speed up the processing rate, decrease the predictive error rate, and to avoid incomprehensibility due to the large number of genes investigated. Improved binary particle swarm optimization (IBPSO) is used in this study to implement feature selection, and the K-nearest neighbor (K-NN) method serves as an evaluator of the IBPSO for gene expression data classification problems. Experimental results show that this method effectively simplifies feature selection and reduces the total number of features needed. The classification accuracy obtained by the proposed method has the highest classification accuracy in nine of the 11 gene expression data test problems, and is comparative to the classification accuracy of the two other test problems, as compared to the best results previously published.