CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition

  • Authors:
  • Asifullah Khan;Abdul Majid;Maqsood Hayat

  • Affiliations:
  • Department of Information and Computer Sciences, Pakistan Institute of Engineering and Applied Sciences, P.O. 45650, Nilore, Islamabad, Pakistan;Department of Information and Computer Sciences, Pakistan Institute of Engineering and Applied Sciences, P.O. 45650, Nilore, Islamabad, Pakistan;Department of Information and Computer Sciences, Pakistan Institute of Engineering and Applied Sciences, P.O. 45650, Nilore, Islamabad, Pakistan

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Precise information about protein locations in a cell facilitates in the understanding of the function of a protein and its interaction in the cellular environment. This information further helps in the study of the specific metabolic pathways and other biological processes. We propose an ensemble approach called ''CE-PLoc'' for predicting subcellular locations based on fusion of individual classifiers. The proposed approach utilizes features obtained from both dipeptide composition (DC) and amphiphilic pseudo amino acid composition (PseAAC) based feature extraction strategies. Different feature spaces are obtained by varying the dimensionality using PseAAC for a selected base learner. The performance of the individual learning mechanisms such as support vector machine, nearest neighbor, probabilistic neural network, covariant discriminant, which are trained using PseAAC based features is first analyzed. Classifiers are developed using same learning mechanism but trained on PseAAC based feature spaces of varying dimensions. These classifiers are combined through voting strategy and an improvement in prediction performance is achieved. Prediction performance is further enhanced by developing CE-PLoc through the combination of different learning mechanisms trained on both DC based feature space and PseAAC based feature spaces of varying dimensions. The predictive performance of proposed CE-PLoc is evaluated for two benchmark datasets of protein subcellular locations using accuracy, MCC, and Q-statistics. Using the jackknife test, prediction accuracies of 81.47 and 83.99% are obtained for 12 and 14 subcellular locations datasets, respectively. In case of independent dataset test, prediction accuracies are 87.04 and 87.33% for 12 and 14 class datasets, respectively.