Unsupervised fuzzy-rough set-based dimensionality reduction

  • Authors:
  • Neil Mac ParthaláIn;Richard Jensen

  • Affiliations:
  • Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, SY23 3DB Wales, UK;Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, SY23 3DB Wales, UK

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Each year worldwide, more and more data is collected. In fact, it is estimated that the amount of data collected and stored at least doubles every 2years. Of this data, a large percentage is unlabelled or has labels which are incomplete or missing. It is because this data is so large that it becomes very difficult for humans to manually assign labels to data objects. Additionally, many real-world application datasets such as those in gene expression analysis, and text classification are also of large dimensionality. This further frustrates the process of label assignment for domain experts as not all of the features are relevant or necessary in order to assign a given label. Hence unsupervised feature selection is required. For supervised learning, feature selection algorithms attempt to maximise a given function of predictive accuracy. This function typically considers the ability of feature vectors to reflect decision class labels. However, for the unsupervised learning task, decision class labels are not provided, which poses questions such as: which features should be retained? In fact, not all features are important and some are irrelevant, redundant or noisy. In this paper, several unsupervised FS approaches are presented which are based on fuzzy-rough sets. These approaches require no thresholding information, are domain-independent, and can operate on real-valued data without the need for discretisation. They offer a significant reduction in dimensionality whilst retaining the semantics of the data, and can even result in supersets of the supervised fuzzy-rough approaches. The approaches are compared with some supervised techniques and are shown to retain useful features.