ProClusEnsem: Predicting membrane protein types by fusing different modes of pseudo amino acid composition

  • Authors:
  • Jingyan Wang;Yongping Li;Quanquan Wang;Xinge You;Jiaju Man;Chao Wang;Xin Gao

  • Affiliations:
  • Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah 21534, Saudi Arabia and Shanghai Institute of Applied Physics, Chinese Acade ...;Shanghai Institute of Applied Physics, Chinese Academy of Science, 2019 Jialuo Road, Jiading District, Shanghai 201800, PR China and Shanghai Key Laboratory of Intelligent Information Processing, ...;Shanghai Institute of Applied Physics, Chinese Academy of Science, 2019 Jialuo Road, Jiading District, Shanghai 201800, PR China;Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China and Key Laboratory of High Performance Computing and Stochastic Inf ...;Key Laboratory of High Performance Computing and Stochastic Information Processing, Ministry of Education of China, College of Mathematics and Computer Science, Hunan Normal University, Changsha, ...;Department of Biomedical Engineering, Oregon Health and Science University, 20000 NW Walker Rd., Beaverton, OR 97006, USA;Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah 21534, Saudi Arabia

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Knowing the type of an uncharacterized membrane protein often provides a useful clue in both basic research and drug discovery. With the explosion of protein sequences generated in the post genomic era, determination of membrane protein types by experimental methods is expensive and time consuming. It therefore becomes important to develop an automated method to find the possible types of membrane proteins. In view of this, various computational membrane protein prediction methods have been proposed. They extract protein feature vectors, such as PseAAC (pseudo amino acid composition) and PsePSSM (pseudo position-specific scoring matrix) for representation of protein sequence, and then learn a distance metric for the KNN (K nearest neighbor) or NN (nearest neighbor) classifier to predicate the final type. Most of the metrics are learned using linear dimensionality reduction algorithms like Principle Components Analysis (PCA) and Linear Discriminant Analysis (LDA). Such metrics are common to all the proteins in the dataset. In fact, they assume that the proteins lie on a uniform distribution, which can be captured by the linear dimensionality reduction algorithm. We doubt this assumption, and learn local metrics which are optimized for local subset of the whole proteins. The learning procedure is iterated with the protein clustering. Then a novel ensemble distance metric is given by combining the local metrics through Tikhonov regularization. The experimental results on a benchmark dataset demonstrate the feasibility and effectiveness of the proposed algorithm named ProClusEnsem.