Machine Learning
Performance Comparison of Generalized PSSM in Signal Peptide Cleavage Site
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
The Applicability of Recurrent Neural Networks for Biological Sequence Analysis
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Discriminating Transmembrane Proteins From Signal Peptides Using SVM-Fisher Approach
ICMLA '05 Proceedings of the Fourth International Conference on Machine Learning and Applications
ICNC '08 Proceedings of the 2008 Fourth International Conference on Natural Computation - Volume 05
Support Vector Machines
Artificial Neural Networks: Methods and Applications (Methods in Molecular Biology)
Artificial Neural Networks: Methods and Applications (Methods in Molecular Biology)
Applications of evolutionary SVM to prediction of membrane alpha-helices
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model.