C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Bioinformatics—an introduction for computer scientists
ACM Computing Surveys (CSUR)
Short-Range interactions and decision tree-based protein contact map predictor
EvoBIO'12 Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
Hi-index | 0.00 |
In this paper, we focus on protein contact map prediction, one of the most important intermediate steps of the protein folding problem. We describe a method where contact maps of proteins are predicted with decision trees, using as input codings the information obtained from all possible pairs of amino acids that were formed in the training data set. As a result, the algorithm creates a model that consists of 400 decision trees (one for each possible amino acids pair), which takes into account the amino acids frequency in the subsequence existent between the couple of amino acids analyzed. In order to evaluate the method generalization capabilities, we carry out an experiment using 173 nonhomologous proteins of known structures, selected from the protein databank (PBD). Our results indicate that the method can assign protein contacts with an average accuracy of 0.34, superior to the 0.25 obtained by the FNETCSS method. This shows that our algorithm improves the accuracy with respect to the methods compared, especially with the increase of protein length.