Improved Feature Selection by Incorporating Gene Similarity into the LASSO

Authors:
Christopher E. Gillies;Xiaoli Gao;Nilesh V. Patel;Mohammad-Reza Siadat;George D. Wilson
Affiliations:
Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA;Department of Mathematics and Statistics, Oakland University, Rochester, MI, USA;Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA;Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA;Radiation Oncology Department and BioBank Department Beaumont Health System, Royal Oak, MI, USA
Venue:
International Journal of Knowledge Discovery in Bioinformatics
Year:
2012

Citing 5
Cited 0

Convex Optimization

Convex Optimization
Correlation between Gene Expression and GO Semantic Similarity

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A review of feature selection techniques in bioinformatics

Bioinformatics
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
SoFoCles: Feature filtering for microarray classification based on Gene Ontology

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Personalized medicine is customizing treatments to a patient's genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model's objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors' modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors' model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.