Prediction of unfolded segments in a protein sequence based on amino acid composition

Authors:
Karen Coeytaux;Anne Poupon
Affiliations:
Yeast Structural Genomics, IBBMC, Bat 430, Université Paris-Sud 91405 Orsay Cedex, France;Yeast Structural Genomics, IBBMC, Bat 430, Université Paris-Sud 91405 Orsay Cedex, France
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 2

Reducing overfitting in predicting intrinsically unstructured proteins

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Prediction of disorder with new computational tool: BVDEA

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	3.85

Visualization

Abstract

Motivation: Partially and wholly unstructured proteins have now been identified in all kingdoms of life---more commonly in eukaryotic organisms. This intrinsic disorder is related to certain critical functions. Apart from their fundamental interest, unstructured regions in proteins may prevent crystallization. Therefore, the prediction of disordered regions is an important aspect for the understanding of protein function, but may also help to devise genetic constructs. Results: In this paper we present a computational tool for the detection of unstructured regions in proteins based on two properties of unfolded fragments: (1) disordered regions have a biased composition and (2) they usually contain either small or no hydrophobic clusters. In order to quantify these two facts we first calculate the amino acid distributions in structured and unstructured regions. Using this distribution, we calculate for a given sequence fragment the probability to be part of either a structured or an unstructured region. For each amino acid, the distance to the nearest hydrophobic cluster is also computed. Using these three values along a protein sequence allows us to predict unstructured regions, with very simple rules. This method requires only the primary sequence, and no multiple alignment, which makes it an adequate method for orphan proteins. Availability: http://genomics.eu.org/ Contact: Anne.Poupon@ibbmc.u-psud.fr