Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Predicting O-glycosylation sites in mammalian proteins by using SVMs
Computational Biology and Chemistry
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Hi-index | 0.00 |
Glycosylation is one of the most important post-translation modifications steps in the synthesis of membrane and secreted proteins and more than half of all proteins are glycosylated. In this paper, we propose a principal component analysis (PCA) based subspace approach for pattern analysis and prediction of O-glycosylation sites in protein. PCA is used to find principal components and subspaces of glycosylated residues and nonglycoslylated residues, respectively. From the calculated principal compoents, we found that the glycosylted proteins all have a high serine, threonine and proline content. The prediction can be viewed as a 4-classes classification problem or 2-classes classification problems. We project the protein sequence (test vector) into each subspace and calculate the distance between the test vector and its projection on the subspace. The protein sequence can be classified into the "nearest" class. The prediction accuracy for nonglycosylated sites (negative sites) is about 70%-90%, and the accuracy for O-glycosylated sites (positive sites) is about 70%-100%.