Understanding protein dispensability through machine-learning analysis of high-throughput data

Authors:
Yu Chen;Dong Xu
Affiliations:
UT-ORNL Graduate School of Genome Science and Technology Oak Ridge, TN, USA;UT-ORNL Graduate School of Genome Science and Technology Oak Ridge, TN, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 2

Sequence kernels for predicting protein essentiality

Proceedings of the 25th international conference on Machine learning
Genome-wide functional annotation by integrating multiple microarray datasets using meta-analysis

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Protein dispensability is fundamental to the understanding of gene function and evolution. Recent advances in generating high-throughput data such as genomic sequence data, protein--protein interaction data, gene-expression data and growth-rate data of mutants allow us to investigate protein dispensability systematically at the genome scale. Results: In our studies, protein dispensability is represented as a fitness score that is measured by the growth rate of gene-deletion mutants. By the analyses of high-throughput data in yeast Saccharomyces cerevisiae, we found that a protein's dispensability had significant correlations with its evolutionary rate and duplication rate, as well as its connectivity in protein--protein interaction network and gene-expression correlation network. Neural network and support vector machine were applied to predict protein dispensability through high-throughput data. Our studies shed some lights on global characteristics of protein dispensability and evolution. Availability: The original datasets for protein dispensability analysis and prediction, together with related scripts, are available at http://digbio.missouri.edu/~ychen/ProDispen/ Contact: xudong@missouri.edu