Unsupervised fuzzy-rough set-based dimensionality reduction

Authors:
Neil Mac ParthaláIn;Richard Jensen
Affiliations:
Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, SY23 3DB Wales, UK;Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, SY23 3DB Wales, UK
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 22
Cited 1

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
A comparative study of fuzzy rough sets

Fuzzy Sets and Systems
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Feature Subset Selection and Order Identification for Unsupervised Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Case Generation Using Rough Sets with Fuzzy Representation

IEEE Transactions on Knowledge and Data Engineering
Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches

IEEE Transactions on Knowledge and Data Engineering
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)

Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
A fast and effective method to find correlations among attributes in databases

Data Mining and Knowledge Discovery
Evolutionary model selection in unsupervised learning

Intelligent Data Analysis
Redundancy in Feature Extraction

IEEE Transactions on Computers
Feature Selection with a Linear Dependence Measure

IEEE Transactions on Computers
Unsupervised feature selection and general pattern discovery using Self-Organizing Maps for gaining insights into the nature of seismic wavefields

Computers & Geosciences
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
New approaches to fuzzy-rough feature selection

IEEE Transactions on Fuzzy Systems
A comparative study of fuzzy sets and rough sets

Information Sciences: an International Journal
Beyond Redundancies: A Metric-Invariant Method for Unsupervised Feature Selection

IEEE Transactions on Knowledge and Data Engineering
Feature selection with fuzzy decision reducts

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Measures for unsupervised fuzzy-rough feature selection

International Journal of Hybrid Intelligent Systems - Advances in Intelligent Agent Systems
Unsupervised feature evaluation: a neuro-fuzzy approach

IEEE Transactions on Neural Networks

Finding rough and fuzzy-rough set reducts with SAT

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Each year worldwide, more and more data is collected. In fact, it is estimated that the amount of data collected and stored at least doubles every 2years. Of this data, a large percentage is unlabelled or has labels which are incomplete or missing. It is because this data is so large that it becomes very difficult for humans to manually assign labels to data objects. Additionally, many real-world application datasets such as those in gene expression analysis, and text classification are also of large dimensionality. This further frustrates the process of label assignment for domain experts as not all of the features are relevant or necessary in order to assign a given label. Hence unsupervised feature selection is required. For supervised learning, feature selection algorithms attempt to maximise a given function of predictive accuracy. This function typically considers the ability of feature vectors to reflect decision class labels. However, for the unsupervised learning task, decision class labels are not provided, which poses questions such as: which features should be retained? In fact, not all features are important and some are irrelevant, redundant or noisy. In this paper, several unsupervised FS approaches are presented which are based on fuzzy-rough sets. These approaches require no thresholding information, are domain-independent, and can operate on real-valued data without the need for discretisation. They offer a significant reduction in dimensionality whilst retaining the semantics of the data, and can even result in supersets of the supervised fuzzy-rough approaches. The approaches are compared with some supervised techniques and are shown to retain useful features.