Prediction of mitochondrial matrix protein structures based on feature selection and fragment assembly

  • Authors:
  • Gualberto Asencio-Cortés;Jesús S. Aguilar-Ruiz;Alfonso E. Márquez-Chamorro;Roberto Ruiz;Cosme E. Santiesteban-Toca

  • Affiliations:
  • School of Engineering, Pablo de Olavide University, Seville, Spain;School of Engineering, Pablo de Olavide University, Seville, Spain;School of Engineering, Pablo de Olavide University, Seville, Spain;School of Engineering, Pablo de Olavide University, Seville, Spain;Centro de Bioplantas, University of Ciego de Ávila, Cuba

  • Venue:
  • EvoBIO'12 Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Protein structure prediction consists in determining the thre-e-dimensional conformation of a protein based only on its amino acid sequence. This is currently a difficult and significant challenge in structural bioinformatics because these structures are necessary for drug designing. This work proposes a method that reconstructs protein structures from protein fragments assembled according to their physico-chemical similarities, using information extracted from known protein structures. Our prediction system produces distance maps to represent protein structures, which provides more information than contact maps, which are predicted by many proposals in the literature. Most commonly used amino acid physico-chemical properties are hydrophobicity, polarity and charge. In our method, we performed a feature selection on the 544 properties of the AAindex repository, resulting in 16 properties which were used to predictions. We tested our proposal on 74 mitochondrial matrix proteins with a maximum sequence identity of 30% obtained from the Protein Data Bank. We achieved a recall of 0.80 and a precision of 0.79 with an 8-angstrom cut-off and a minimum sequence separation of 7 amino acids. Finally, we compared our system with other relevant proposal on the same benchmark and we achieved a recall improvement of 50.82%. Therefore, for the studied proteins, our method provides a notable improvement in terms of recall.