Evolving ensembles of feature subsets towards optimal feature selection for unsupervised and semi-supervised clustering

Authors:
Mihaela Elena Breaban
Affiliations:
Faculty of Computer Science, Al. I. Cuza University, Iasi, Romania
Venue:
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Year:
2010

Citing 10
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection as a Preprocessing Step for Hierarchical Clustering

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
Novel Unsupervised Feature Filtering of Biological Data

Bioinformatics
Consensus unsupervised feature ranking from multiple views

Pattern Recognition Letters
Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data (TKDD)
Unsupervised feature weighting with multi niche crowding genetic algorithms

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Forward semi-supervised feature selection

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The work in unsupervised learning centered on clustering has been extended with new paradigms to address the demands raised by real-world problems. In this regard, unsupervised feature selection has been proposed to remove noisy attributes that could mislead the clustering procedures. Additionally, semi-supervision has been integrated within existing paradigms because some background information usually exist in form of a reduced number of similarity/dissimilarity constraints. In this context, the current paper investigates a method to perform simultaneously feature selection and clustering. The benefits of a semi-supervised approach making use of reduced external information are highlighted against an unsupervised approach. The method makes use of an ensemble of near-optimal feature subsets delivered by a multi-modal genetic algorithm in order to quantify the relative importance of each feature to clustering.