Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

Authors:
Jianlin Cheng;Michael J. Sweredoski;Pierre Baldi
Affiliations:
School of Information and Computer Science, Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, USA 92697;School of Information and Computer Science, Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, USA 92697;School of Information and Computer Science, Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, USA 92697
Venue:
Data Mining and Knowledge Discovery
Year:
2005

Citing 2
Cited 6

The principled design of large-scale recursive neural network architectures--dag-rnns and the protein structure prediction problem

The Journal of Machine Learning Research
Input-output HMMs for sequence processing

IEEE Transactions on Neural Networks

DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks

Data Mining and Knowledge Discovery
Building a disordered protein database: a case study in managing biological data

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Multirelational classification: a multiple view approach

Knowledge and Information Systems
A Kernel Framework for Protein Residue Annotation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Enhancing protein disorder detection by refined secondary structure prediction

BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Prediction of disorder with new computational tool: BVDEA

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Intrinsically disordered regions in proteins are relatively frequent and important for our understanding of molecular recognition and assembly, and protein structure and function. From an algorithmic standpoint, flagging large disordered regions is also important for ab initio protein structure prediction methods. Here we first extract a curated, non-redundant, data set of protein disordered regions from the Protein Data Bank and compute relevant statistics on the length and location of these regions. We then develop an ab initio predictor of disordered regions called DISpro which uses evolutionary information in the form of profiles, predicted secondary structure and relative solvent accessibility, and ensembles of 1D-recursive neural networks. DISpro is trained and cross validated using the curated data set. The experimental results show that DISpro achieves an accuracy of 92.8% with a false positive rate of 5%. DISpro is a member of the SCRATCH suite of protein data mining tools available through http://www.igb.uci.edu/servers/psss.html.