Sequence kernels for predicting protein essentiality
Proceedings of the 25th international conference on Machine learning
Genome-wide functional annotation by integrating multiple microarray datasets using meta-analysis
International Journal of Data Mining and Bioinformatics
Hi-index | 3.84 |
Motivation: Protein dispensability is fundamental to the understanding of gene function and evolution. Recent advances in generating high-throughput data such as genomic sequence data, protein--protein interaction data, gene-expression data and growth-rate data of mutants allow us to investigate protein dispensability systematically at the genome scale. Results: In our studies, protein dispensability is represented as a fitness score that is measured by the growth rate of gene-deletion mutants. By the analyses of high-throughput data in yeast Saccharomyces cerevisiae, we found that a protein's dispensability had significant correlations with its evolutionary rate and duplication rate, as well as its connectivity in protein--protein interaction network and gene-expression correlation network. Neural network and support vector machine were applied to predict protein dispensability through high-throughput data. Our studies shed some lights on global characteristics of protein dispensability and evolution. Availability: The original datasets for protein dispensability analysis and prediction, together with related scripts, are available at http://digbio.missouri.edu/~ychen/ProDispen/ Contact: xudong@missouri.edu