Network-constrained regularization and variable selection for analysis of genomic data

Authors:
Caiyan Li;Hongzhe Li
Affiliations:
-;-
Venue:
Bioinformatics
Year:
2008

Citing 0
Cited 14

Integration of Full-Coverage Probabilistic Functional Networks with Relevance to Specific Biological Processes

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Modeling oncology gene pathways network with multiple genotypes and phenotypes via a copula method

CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Boosting with structure information in the functional space: an application to graph classification

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Regularization and feature selection for networked features

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Semi-supervised learning of sparse linear models in mass spectral imaging

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Network-based sparse Bayesian classification

Pattern Recognition
Improving accuracy of microarray classification by a simple multi-task feature selection filter

International Journal of Data Mining and Bioinformatics
Support Vector Machine incorporated with feature discrimination

Expert Systems with Applications: An International Journal
An experimental comparison of gene selection by Lasso and Dantzig selector for cancer classification

Computers in Biology and Medicine
Feature grouping and selection over an undirected graph

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Sparse methods for biomedical data

ACM SIGKDD Explorations Newsletter
Mining discriminative subgraphs from global-state networks

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Computational regulatory network construction from microRNA and transcription factor perspectives

ACM SIGBioinformatics Record
A novel network and sparsity constraint regression model for functional module identification in genomic data analysis

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu