The improvement of breast cancer prognosis accuracy from integrated gene expression and clinical data

  • Authors:
  • Austin H. Chen;Chenyin Yang

  • Affiliations:
  • Department of Medical Informatics, Tzu Chi University, No. 701, Sec. 3, Jhongyang Rd., Hualien City, Hualien County 97004, Taiwan;Graduate Institute of Medical Informatics, Tzu Chi University, No. 701, Sec. 3, Jhongyang Rd., Hualien City, Hualien County 97004, Taiwan

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.07

Visualization

Abstract

Predicting the accurate prognosis of breast cancer from high throughput microarray data is often a challenging task. Although many statistical methods and machine learning techniques were applied to diagnose the prognosis outcome of breast cancer, they are suffered from the low prediction accuracy (usually lower than 70%). In this paper, we propose a better method (genetic algorithm-support vector machine, we called GASVM) to significant improve the prediction accuracy of breast cancer from gene expression profiles. To further improve the classification performance, we also apply GASVM model using combined clinical and microarray data. In this paper, we evaluate the performance of the GASVM model based on data provided by 97 breast cancer patients. Four kinds of gene selection methods are used: all genes (All), 70 correlation-selected genes (C70), 15 medical literature-selected genes (R15), and 50 T-test-selected genes (T50). With optimized parameter values identified from GASVM model, the average predictive accuracy of our model approaches 95% for T50 and 90% for C70 or R15 in all four kernel functions using integrated clinical and microarray data. Our model produces results more accurately than the average 70% predictive accuracy of other machine learning methods. The results indicate that the GASVM model has the potential to better assist physicians in the prognosis of breast cancer through the use of both clinical and microarray data.