Feature selection for high-dimensional machinery fault diagnosis data using multiple models and Radial Basis Function networks

  • Authors:
  • Kui Zhang;Yuhua Li;Philip Scarf;Andrew Ball

  • Affiliations:
  • Centre of OR and Applied Statistics, Salford Business School, University of Salford, Manchester M5 4WT, UK;School of Computing and Intelligent Systems, The University of Ulster, Londonderry BT48 7JL, UK;Centre of OR and Applied Statistics, Salford Business School, University of Salford, Manchester M5 4WT, UK;School of Computing and Engineering, The University of Huddersfield, Huddersfield HD1 3DH, UK

  • Venue:
  • Neurocomputing
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

The technique of machinery fault diagnosis has been greatly enhanced over recent years with the application of many pattern classification methods. However, these classification methods suffer from the ''curse of dimensionality'' when applied to high-dimensional fault diagnosis data. In order to solve the problem, this paper proposes a hybrid model which combines multiple feature selection models to select the most significant input features from all potentially relevant features. Among the models, eight filter models are used to pre-rank the candidate features. They include data variance, Pearson correlation coefficient, the Relief algorithm, Fisher score, class separability, chi-squared, information gain and gain ratio. These variable ranking models measure features from various perspectives, and lead to different ranking results. Based on the effect of the ranking results on the Radial Basis Function (RBF) classification, a weighted voting scheme is then introduced to re-rank features. Furthermore, two wrapper models, a Binary Search (BS) model and a Sequential Backward Search (SBS) model are utilized to minimize the number of relevant features. To demonstrate the potential for applying the method to machinery fault diagnosis, two case studies are discussed. The experiment results support the conclusion that this method is useful for revealing fault-related frequency features.