Feature selection using localized generalization error for supervised classification problems using RBFNN

  • Authors:
  • Wing W. Y. Ng;Daniel S. Yeung;Michael Firth;Eric C. C. Tsang;Xi-Zhao Wang

  • Affiliations:
  • School of Computer Science and Engineering, South China University of Technology, Guangzhou 510640, China and Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong ...;School of Computer Science and Engineering, South China University of Technology, Guangzhou 510640, China and Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong ...;Department of Finance and Insurance, Lingnan University, Hong Kong;Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;Machine Learning Center, Faculty of Mathematics and Computer Science, Hebei University, Baoding 071002, China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

A pattern classification problem usually involves using high-dimensional features that make the classifier very complex and difficult to train. With no feature reduction, both training accuracy and generalization capability will suffer. This paper proposes a novel hybrid filter-wrapper-type feature subset selection methodology using a localized generalization error model. The localized generalization error model for a radial basis function neural network bounds from above the generalization error for unseen samples located within a neighborhood of the training samples. Iteratively, the feature making the smallest contribution to the generalization error bound is removed. Moreover, the novel feature selection method is independent of the sample size and is computationally fast. The experimental results show that the proposed method consistently removes large percentages of features with statistically insignificant loss of testing accuracy for unseen samples. In the experiments for two of the datasets, the classifiers built using feature subsets with 90% of features removed by our proposed approach yield average testing accuracies higher than those trained using the full set of features. Finally, we corroborate the efficacy of the model by using it to predict corporate bankruptcies in the US.