Selection of Patient Samples and Genes for Outcome Prediction

Authors:
Huiqing Liu;Jinyan Li;Limsoon Wong
Affiliations:
Institute for Infocomm Research;Institute for Infocomm Research;Institute for Infocomm Research
Venue:
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Year:
2004

Citing 2
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations

Enabling more sophisticated gene expression analysis for understanding diseases and optimizing treatments

ACM SIGKDD Explorations Newsletter - Special issue on data mining for health informatics
Gene Selection Using Wilcoxon Rank Sum Test and Support Vector Machine for Cancer Classification

Computational Intelligence and Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gene expression profiles with clinical outcome data enable monitoring of disease progression and prediction of patient survival at the molecular level. We present a new computational method for outcome prediction. Our idea is to use an informative subset of original training samples. This subset consists of only short-term survivors who died within a short period and long-term survivors who were still alive after a long follow-up time. These extreme training samples yield a clear platform to identify genes whose expression is related to survival. To find relevant genes, we combine two feature selection methods 驴 entropy measure and Wilcoxon rank sum test 驴 so that a set of sharp discriminating features are identified. The selected training samples and genes are then integrated by a support vector machine to build a prediction model, by which each validation sample is assigned a survival/relapse risk score for drawing Kaplan-Meier survival curves. We apply this method to two data sets: diffuse large-B-cell lymphoma (DLBCL) and primary lung adenocarcinoma. In both cases, patients in high and low risk groups stratified by our risk scores are clearly distinguishable. We also compare our risk scores to some clinical factors, such as International Prognostic Index score for DLBCL analysis and tumor stage information for lung adenocarcinoma. Our results indicate that gene expression profiles combined with carefully chosen learning algorithms can predict patient survival for certain diseases.