Ensemble component selection for improving ICA based microarray data prediction models

  • Authors:
  • Kun-Hong Liu;Bo Li;Jun Zhang;Ji-Xiang Du

  • Affiliations:
  • School of Software, Xiamen University, 361005, Xiamen, Fujian,P.R. China;School of Computer Science of Technology, Wuhan University of Science and Techology, 430081, 947 Heping Road, Wuhan, Hubei, P.R. China;School of Electronic Science and Technology, Anhui University, Hefei, Anhui, P.R. China;Department of Computer Science and Technology, Huaqiao University, Quanzhou 362021, Fujian, P.R. China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Independent component analysis (ICA) has been widely used to tackle the microarray dataset classification problem, but there still exists an unsolved problem that the independent component (IC) sets may not be reproducible after different ICA transformations. Inspired by the idea of ensemble feature selection, we design an ICA based ensemble learning system to fully utilize the difference among different IC sets. In this system, some IC sets are generated by different ICA transformations firstly. A multi-objective genetic algorithm (MOGA) is designed to select different biologically significant IC subsets from these IC sets, which are then applied to build base classifiers. Three schemes are used to fuse these base classifiers. The first fusion scheme is to combine all individuals in the final generation of the MOGA. In addition, in the evolution, we design a global-recording technique to record the best IC subsets of each IC set in a global-recording list. Then the IC subsets in the list are deployed to build base classifier so as to implement the second fusion scheme. Furthermore, by pruning about half of less accurate base classifiers obtained by the second scheme, a compact and more accurate ensemble system is built, which is regarded as the third fusion scheme. Three microarray datasets are used to test the ensemble systems, and the corresponding results demonstrate that these ensemble schemes can further improve the performance of the ICA based classification model, and the third fusion scheme leads to the most accurate ensemble system with the smallest ensemble size.