A method for improving protein localization prediction from datasets with outliers

Authors:
Jiang Tian;Hong Gu;Wenqi Liu
Affiliations:
School of Electronic and Information Engineering, Dalian University of Technology, Dalian, China;School of Electronic and Information Engineering, Dalian University of Technology, Dalian, China;School of Electronic and Information Engineering, Dalian University of Technology, Dalian, China
Venue:
CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Year:
2009

Citing 4
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
A dimensionality reduction approach to modeling protein flexibility

Proceedings of the sixth annual international conference on Computational biology
Automated alphabet reduction method with evolutionary algorithms for protein structure prediction

Proceedings of the 9th annual conference on Genetic and evolutionary computation
PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale genome analysis and drug discovery require an automated prediction method for protein subcellular localization, and Support Vector Machines (SVMs) effectively solve this problem in a supervised manner. However, the protein subcellular localization datasets obtained from experiments always contain outliers, which can lead to poor generalization ability and classification accuracy. To address this issue, we first analyzed the influence of Principal Component Analysis (PCA) on classification performance, and then proposed a hybrid method for prediction of protein subcellular localization based on Weighted Supported Vector Machine (WSVM) and PCA. Different weights were assigned to different data points, so the training algorithm could learn the decision boundary according to the relative importance of the data points. After performing dimension reduction operations on the datasets, kernel-based possibilistic c-means (KPCM) was chosen to generate weights for this algorithm, as it generates relative high values for important data points but low values for outliers. Experimental results on a benchmark dataset show promising results, which confirms the effectiveness of the proposed method in terms of prediction accuracy.