Predicting deleterious non-synonymous single nucleotide polymorphisms in signal peptides based on hybrid sequence attributes

Authors:
Wenli Qin;Yizhou Li;Juan Li;Lezheng Yu;Di Wu;Runyu Jing;Xuemei Pu;Yanzhi Guo;Menglong Li
Affiliations:
College of Chemistry, Sichuan University, Chengdu 610064, PR China;College of Chemistry, Sichuan University, Chengdu 610064, PR China;College of Chemistry, Sichuan University, Chengdu 610064, PR China;College of Chemistry, Sichuan University, Chengdu 610064, PR China;College of Chemistry, Sichuan University, Chengdu 610064, PR China;College of Chemistry, Sichuan University, Chengdu 610064, PR China;College of Chemistry, Sichuan University, Chengdu 610064, PR China;College of Chemistry, Sichuan University, Chengdu 610064, PR China;College of Chemistry, Sichuan University, Chengdu 610064, PR China
Venue:
Computational Biology and Chemistry
Year:
2012

Citing 8
Cited 0

Random Forests

Machine Learning
Extreme re-balancing for SVMs: a case study

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data mining in bioinformatics using Weka

Bioinformatics
PMUT: a web-based tool for the annotation of pathological mutations on proteins

Bioinformatics
Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information

Bioinformatics
Deleterious SNP prediction: be mindful of your training data!

Bioinformatics
Accurate prediction of deleterious protein kinase polymorphisms

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Signal peptides play a crucial role in various biological processes, such as localization of cell surface receptors, translocation of secreted proteins and cell-cell communication. However, the amino acid mutation in signal peptides, also called non-synonymous single nucleotide polymorphisms (nsSNPs or SAPs) may lead to the loss of their functions. In the present study, a computational method was proposed for predicting deleterious nsSNPs in signal peptides based on random forest (RF) by incorporating position specific scoring matrix (PSSM) profile, SignalP score and physicochemical properties. These features were optimized by the maximum relevance minimum redundancy (mRMR) method. Then, a cost matrix was used to minimize the effect of the imbalanced data classification problem that usually occurred in nsSNPs prediction. The method achieved an overall accuracy of 84.5% and the area under the ROC curve (AUC) of 0.822 by Jackknife test, when the optimal subset included 10 features. Furthermore, on the same dataset, we compared our predictor with other existing methods, including R-score-based method and D-score-based methods, and the result of our method was superior to those of the two methods. The satisfactory performance suggests that our method is effective in predicting the deleterious nsSNPs in signal peptides.