Feature selection based on a modified fuzzy C-means algorithm with supervision

  • Authors:
  • Francesco Marcelloni

  • Affiliations:
  • Dipartimento di Ingegneria della Informazione: Elettronica, Informatica, Telecomunicazioni, University of Pisa, Via Diotisalvi 2, Pisa 56122, Italy

  • Venue:
  • Information Sciences—Informatics and Computer Science: An International Journal
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a new approach to feature selection based on a modified fuzzy C-means algorithm with supervision (MFCMS). MFCMS completes the unsupervised learning of classical fuzzy C-means with labeled patterns. The labeled patterns allow MFCMS to accurately model the shape of each cluster and consequently to highlight the features which result to be particularly effective to characterize a cluster. These features are distinguished by a low variance of their values for the patterns with a high membership degree to the cluster. If, with respect to these features, the distance between the prototype of the cluster and the prototypes of the other clusters is high, then these features have the property of discriminating between the cluster and the other clusters. To take these two aspects into account, for each cluster and each feature, we introduce a purposely defined index: the higher the value of the index, the higher the discrimination capability of the feature for the cluster. We execute MFCMS on the training set considering all patterns as labeled. Then, we retain the features which are associated, at least for one cluster, with an index larger than a threshold τ.We applied MFCMS to several real-world pattern classification benchmarks. We used the well-known k-nearest neighbors as learning algorithm. We show that feature selection performed by MFCMS achieved an improvement in generalization on all data sets.