The nature of statistical learning theory
The nature of statistical learning theory
Hi-index | 0.00 |
Ovarian cancer is a primary gynecological cancer which pathological stages include benign, borderline and invasive stages cause death in many countries. In this paper, linear regression, analysis of variance (ANOVA) and support vector machine (SVM) are used to identify the gene markers of ovarian cancer for an authentic cDNA expression datasets among 8 normal ovarian tumors, 6 borderline of cancers, 7 ovarian cancer at stage I and 9 ovarian cancer at stage III samples. First, the linear regression analysis obtains 200 useful genes with largest residuals. Further select 14 genes by ANOVA and Scheffe when P-value is less than 0.000005. Then, we use support vector machine to classify the pathological stages by gene expressions. Five experiments are performed with clustering conditions. In the first clustering experiment, the cluster 1 includes BOT, and other pathological stages are in cluster 2. They have significant differences at BOT stage and can get average accuracy about 95.686% in cross-validation. It is quite precise for classifying pathological stages by gene expressions. The average accuracy of all clustering experiments is about 88.541% in cross-validation. Besides, we also develop a statistical analysis system including linear regression and ANOVA function for gene expression analysis. The experimental results and our analysis system can assist biologists and doctors to research and diagnose ovarian cancer by gene expressions.