Linear Separability of Gene Expression Data Sets

Authors:
Giora Unger;Benny Chor
Affiliations:
Tel-Aviv University, Tel-Aviv;Tel-Aviv University, Tel-Aviv
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2010

Citing 3
Cited 1

Use of the zero norm with linear models and kernel methods

The Journal of Machine Learning Research
Finding predictive gene groups from microarray data

Journal of Multivariate Analysis
Evolving Feature Selection

IEEE Intelligent Systems

New gene subset selection approaches based on linear separating genes and gene-pairs

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study simple geometric properties of gene expression data sets, where samples are taken from two distinct classes (e.g., two types of cancer). Specifically, the problem of linear separability for pairs of genes is investigated. If a pair of genes exhibits linear separation with respect to the two classes, then the joint expression level of the two genes is strongly correlated to the phenomena of the sample being taken from one class or the other. This may indicate an underlying molecular mechanism relating the two genes and the phenomena(e.g., a specific cancer). We developed and implemented novel efficient algorithmic tools for finding all pairs of genes that induce a linear separation of the two sample classes. These tools are based on computational geometric properties and were applied to 10 publicly available cancer data sets. For each data set, we computed the number of actual separating pairs and compared it to an upper bound on the number expected by chance and to the numbers resulting from shuffling the labels of the data at random empirically. Seven out of these 10 data sets are highly separable. Statistically, this phenomenon is highly significant, very unlikely to occur at random. It is therefore reasonable to expect that it manifests a functional association between separating genes and the underlying phenotypic classes.