On the convergence of protein structure and dynamics: statistical learning studies of pseudo folding pathways

Authors:
Alessandro Vullo;Andrea Passerini;Paolo Frasconi;Fabrizio Costa;Gianluca Pollastri
Affiliations:
School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland;Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Firenze, Italy;Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Firenze, Italy;Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Firenze, Italy;School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland
Venue:
EvoBIO'08 Proceedings of the 6th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Year:
2008

Citing 2
Cited 0

Predicting protein folding pathways

Bioinformatics
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many algorithms that attempt to predict proteins' native structure from sequence need to generate a large set of hypotheses in order to ensure that nearly correct structures are included, leading to the problem of assessing the quality of alternative 3D conformations. This problem has been mostly approached by focusing on the final 3D conformation, with machine learning techniques playing a leading role. We argue in this paper that additional information for recognising nativelike structures can be obtained by regarding the final conformation as the result of a generative process reminiscent of the folding process that generates structures in nature. We introduce a coarse representation of protein pseudo-folding based on binary trees and introduce a kernel function for assessing their similarity. Kernel-based analysis techniques empirically demonstrate a significant correlation between information contained into pseudo-folding trees and features of native folds in a large and non-redundant set of proteins.