Gene selection via the BAHSIC family of algorithms

Authors:
Le Song;Justin Bedo;Karsten M. Borgwardt;Arthur Gretton;Alex Smola
Affiliations:
-;-;-;-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 7

A Hilbert Space Embedding for Distributions

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Microarray Design Using the Hilbert---Schmidt Independence Criterion

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Evaluating switching neural networks through artificial and real gene expression data

Artificial Intelligence in Medicine
A General Framework for Analyzing Data from Two Short Time-Series Microarray Experiments

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds

Pattern Recognition
Feature selection via dependence maximization

The Journal of Machine Learning Research

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. The ultimate goal is to detect the genes that are involved in disease outbreak and progression. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical connections and the differences between these methods. In this article, we define a kernel-based framework for feature selection based on the Hilbert–Schmidt independence criterion and backward elimination, called BAHSIC. We show that several well-known feature selectors are instances of BAHSIC, thereby clarifying their relationship. Furthermore, by choosing a different kernel, BAHSIC allows us to easily define novel feature selection algorithms. As a further advantage, feature selection via BAHSIC works directly on multiclass problems. Results: In a broad experimental evaluation, the members of the BAHSIC family reach high levels of accuracy and robustness when compared to other feature selection techniques. Experiments show that features selected with a linear kernel provide the best classification performance in general, but if strong non-linearities are present in the data then non-linear kernels can be more suitable. Availability: Accompanying homepage is http://www.dbs.ifi.lmu.de/~borgward/BAHSIC Contact: kb@dbs.ifi.lmu.de Supplementary information: Supplementary data are available at Bioinformatics online.