Novel Unsupervised Feature Filtering of Biological Data

Authors:
Roy Varshavsky;Assaf Gottlieb;Michal Linial;David Horn
Affiliations:
-;-;-;-
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 11

Expectation-maximization for sparse and non-negative PCA

Proceedings of the 25th international conference on Machine learning
Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification

Computers in Biology and Medicine
Unsupervised feature weighting with multi niche crowding genetic algorithms

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A unifying criterion for unsupervised clustering and feature selection

Pattern Recognition
Evolving ensembles of feature subsets towards optimal feature selection for unsupervised and semi-supervised clustering

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Combining multiple perspective as intelligent agents into robust approach for biomarker detection in gene expression data

International Journal of Data Mining and Bioinformatics
A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A semi-supervised feature selection method using a non-parametric technique with pairwise instance constraints

Journal of Information Science
Combining information extraction and text mining for cancer biomarker detection

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Feature selection with SVD entropy: Some modification and extension

Information Sciences: an International Journal

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Many methods have been developed for selecting small informative feature subsets in large noisy data. However, unsupervised methods are scarce. Examples are using the variance of data collected for each feature, or the projection of the feature on the first principal component. We propose a novel unsupervised criterion, based on SVD-entropy, selecting a feature according to its contribution to the entropy (CE) calculated on a leave-one-out basis. This can be implemented in four ways: simple ranking according to CE values (SR); forward selection by accumulating features according to which set produces highest entropy (FS1); forward selection by accumulating features through the choice of the best CE out of the remaining ones (FS2); backward elimination (BE) of features with the lowest CE. Results: We apply our methods to different benchmarks. In each case we evaluate the success of clustering the data in the selected feature spaces, by measuring Jaccard scores with respect to known classifications. We demonstrate that feature filtering according to CE outperforms the variance method and gene-shaving. There are cases where the analysis, based on a small set of selected features, outperforms the best score reported when all information was used. Our method calls for an optimal size of the relevant feature set. This turns out to be just a few percents of the number of genes in the two Leukemia datasets that we have analyzed. Moreover, the most favored selected genes turn out to have significant GO enrichment in relevant cellular processes. Abbreviations: Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Quantum Clustering (QC), Gene Shaving (GS), Variance Selection (VS), Backward Elimination (BE) Contact: royke@cs.huji.ac.il Conflicts of Interest: not reported