Prediction of mitochondrial matrix protein structures based on feature selection and fragment assembly

Authors:
Gualberto Asencio-Cortés;Jesús S. Aguilar-Ruiz;Alfonso E. Márquez-Chamorro;Roberto Ruiz;Cosme E. Santiesteban-Toca
Affiliations:
School of Engineering, Pablo de Olavide University, Seville, Spain;School of Engineering, Pablo de Olavide University, Seville, Spain;School of Engineering, Pablo de Olavide University, Seville, Spain;School of Engineering, Pablo de Olavide University, Seville, Spain;Centro de Bioplantas, University of Ciego de Ávila, Cuba
Venue:
EvoBIO'12 Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Year:
2012

Citing 5
Cited 0

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
An introduction to variable and feature selection

The Journal of Machine Learning Research
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction

Pattern Recognition Letters
Predicting residue–residue contacts using random forest models

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein structure prediction consists in determining the thre-e-dimensional conformation of a protein based only on its amino acid sequence. This is currently a difficult and significant challenge in structural bioinformatics because these structures are necessary for drug designing. This work proposes a method that reconstructs protein structures from protein fragments assembled according to their physico-chemical similarities, using information extracted from known protein structures. Our prediction system produces distance maps to represent protein structures, which provides more information than contact maps, which are predicted by many proposals in the literature. Most commonly used amino acid physico-chemical properties are hydrophobicity, polarity and charge. In our method, we performed a feature selection on the 544 properties of the AAindex repository, resulting in 16 properties which were used to predictions. We tested our proposal on 74 mitochondrial matrix proteins with a maximum sequence identity of 30% obtained from the Protein Data Bank. We achieved a recall of 0.80 and a precision of 0.79 with an 8-angstrom cut-off and a minimum sequence separation of 7 amino acids. Finally, we compared our system with other relevant proposal on the same benchmark and we achieved a recall improvement of 50.82%. Therefore, for the studied proteins, our method provides a notable improvement in terms of recall.