Sequence-driven features for prediction of subcellular localization of proteins

Authors:
Jong Kyoung Kim;Sung-Yang Bang;Seungjin Choi
Affiliations:
Department of Computer Science, Pohang University of Science and Technology, San 31 Hyoja-dong, Nam-gu, Pohang 790-784, Korea;Department of Computer Science, Pohang University of Science and Technology, San 31 Hyoja-dong, Nam-gu, Pohang 790-784, Korea;Department of Computer Science, Pohang University of Science and Technology, San 31 Hyoja-dong, Nam-gu, Pohang 790-784, Korea
Venue:
Pattern Recognition
Year:
2006

Citing 8
Cited 1

Algorithms for clustering data

Algorithms for clustering data
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Support Vector Machines

IEEE Intelligent Systems
Predicting subcellular localization of proteins using machine-learned classifiers

Bioinformatics
Predicting subcellular localization of proteins in a hybridization space

Bioinformatics
Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

Pattern Recognition Letters
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

Generalized Needleman-Wunsch algorithm for the recognition of T-cell epitopes

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Prediction of the cellular location of a protein plays an important role in inferring the function of the protein. Feature extraction is a critical part in prediction systems, requiring raw sequence data to be transformed into appropriate numerical feature vectors while minimizing information loss. In this paper, we present a method for extracting useful features from protein sequence data. The method employs local and global pairwise sequence alignment scores as well as composition-based features. Five different features are used for training support vector machines (SVMs) separately and a weighted majority voting makes a final decision. The overall prediction accuracy evaluated by the 5-fold cross-validation reached 88.53% for the eukaryotic animal data set. Comparing the prediction accuracy of various feature extraction methods, provides a biological insight into the location of targeting information. Our experimental results confirm that our feature extraction methods are very useful for predicting subcellular localization of proteins.